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Claims 



A protein structure comprising a plurality of first peptide monomer units arranged in a 
first strand and a plurality of second peptide monomer units arranged in a second 
strand wherein the first and second monomer units comprise the heptad repeat motif 
(abcdefg) and/or the hendecad repeat motif (abcdefghijk), and wherein a pair of 
asparagincs, arginines, lysines or other complementary residues in the "a M position on 
at least one pair of corresponding first and second monomer units ensures that the first 
strand and the second strand form a staggered parallel heterodimer coiled coil 
structure. 

A protein structure according to claim 1, wherein a first peptide monomer unit in the 
first strand extends beyond a corresponding second peptide monomer unit in the 
second strand in the direction of the strands. 

A protein structure according to any one of claims 1 to 2 in which at least one 
charged amino acid residue of a first peptide monomer unit is arranged to attract an 
oppositely-charged amino acid residue of a second peptide monomer unit. 

A protein structure according to claim 3 in which the charged amino acid residue is in 
an end portion of the first peptide monomer unit which extends beyond the 
corresponding second peptide monomer unit in the second strand. 

A protein structure according to any one of the preceding claims in which at least one 
strand consists solely of first or second peptide monomer units respectively. 

A protein structure according to any one of the preceding claims wherein one or more 
of the other "a" positions of the first and second monomer units is a hydrophobic 
residue. 
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7. A protein structure according to claim 6, wherein the hydrophobic residue is selected 
from isoleucine or valine. 



8. A protein structure according to any one of the preceding claims having a leucine at 
one or more of the "cT positions of the first and second monomer units. 



9. A protein structure according to any one of the preceding claims having 

oppositely-charged or otherwise complementary residues at positions g and e of 
respective monomer units. 



10. A protein structure according to claim 9 in which the oppositely-charged residues are 
glutamic acid and lysine residues or arginine and aspartic acid residues, or synthetic 
derivatives of these amino acid residues. 



11. A protein structure according to any preceding claim in which the structure is 

stabilised by pairs of asparagine, arginine, lysine or other complementary residues 
provided by corresponding first and second peptide monomer units. 



12, A protein structure according to any preceding claim which is arranged to form a 
tubular structure. 



13. A protein structure according to claim 12 in which the repeat motifs are offset by two 
or more amino acid positions in sequence whereby the peptide monomer units form a 
cylinder. 



14. A protein structure according to any preceding claim in which the first and second 
peptide monomer units have the sequence. 



a) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-pl) and 
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b) KIRALKAKNAHLKQEIAALEQEIAALEQ (SAF-p2) respectively; or 
C) KI AALKQKI AALKQE IDALE YENDALEQ (SAF-plA) and 

d) KI RALKWKNAHLKQE IAALEQE I AALEQ (SAF-p2C) respectively; or 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC) and 

f) KI RALKWKNAHLKQE I AALEQE I AALEQ (SAF-p2C) respectively. 

15. A peptide monomer unit for use in preparing a protein structure the peptide monomer 
unit having an amino acid sequence selected from: 

a) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-pl); 

b) KIRALKAKNAHLKQE IAALEQE I AALEQ (SAF-p2); 
C) KIAALKQKIAALKQEIDALEYENDALEQ (SAF-plA); 

d) KI RALKWKNAHLKQE IAALEQE I AALEQ (SAF-p2Q and 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC). 

16. A protein structure according to any one of claims 1 to 14 or a peptide monomer unit 
according to claim 15 wherein at least one amino acid residue is derivatised. 

17. A branching self-assembling fibre comprising two or more protein structures 
according to any one of claims 1 to 1 1, coupled together to form a T-shaped 
conjugated structure. 
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18. The branching self-assembling fibre of claim 17, wherein at least one of the protein 
structures comprises one or more central cysteine residues, and at least one other 
protein structure comprises a terminal cysteine residue. 

19. A method of producing protein structures, the method comprising providing a mixture 
of first and second monomer units which associate to form a protein structure 
according to any one of claims 1 to 14, wherein the first and second monomer units . 
comprise the heptad repeat motif (abedefg) and/or the hendecad repeat motif 
(abedefghijk). 

20. A method according to claim 19 in which the protein structure is derivatised. 

21. A method according to claim 1 9 or 20 in which the protein structure is stabilised by 
cross-linking. 

22. A protein fibre produced by an association of protein structures according to any one 
of claims 1 to 14. 

23. A kit for making a protein structure, the kit comprising first and second peptide 
monomer units which associate to form a protein structure according to any one of 
claims 1 to 14 or a protein fibre according to claim 22, wherein the first and second 
monomer units comprise the heptad repeat motif (abedefg) and/or the hendecad repeat 
motif (abedefghijk). 

24. A two dimensional grid comprising a protein structure according to any one of claims 
1 to 14 or a protein fibre according to claim 22. 

25. A three dimensional matrix comprising a protein structure according to any one of 
claims 1 to 14 or a protein fibre according to claim 22. 

26. A matrix according to claim 25 which is arranged to assemble in solution. 
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A matrix according to claim 25 or claim 26, wherein one or more binders is fused to 
the protein structure, wherein the one or more binders are aligned to give high 
avidities for one or more target entities. 

A matrix according to any one of claims 25 to 27 which is arranged to bind one or 
more target entities. 

A matrix according to claim 28 which is arranged to bind viruses. 

A method of forming a matrix according to any one of claims 25 to 29 in which a 
mixture of separate first and second monomer units is provided, wherein the first and 
second monomer units comprise the heptad repeat motif (abcdefg) and/or the 
hendecad repeat motif (abcdefghijk) and are caused to associate to form a plurality of 
protein structures according to any one of claims 1 to 14, wherein the protein 
structures assemble to form a three-dimensional matrix. 

A method according to claim 30 in which the matrix is formed in situ. 

A method for controlling the production of a synthetic polymers comprising 
assembling a protein structure in accordance to any one of claims I to 14 in 
association with the polymer. 

A method according to claim 32 in which the protein structure is removed after 
synthesis of the polymer. 

A tip for use in Atomic Force Microscopy comprising a protein structure according to 
any one of claims 1 to 1 4. 
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^ineiple of using "sticky ends" is well developed in molecular biology for assembling DNA 
(S. J. Palmer et al (1998) Nucleic Acids Res. 26, 2560), and has been used to design intricate 
DNA crystals (E. Winfree et al (1998) Nature 394, 539). However, to our knowledge, our 
application of sticky end-directed molecular assembly to peptides is new; although we do note 
that head-to-tail packing of helices has been observed in recently solved crystal structures for 
two designer peptides (N. L. Ogihara et al (1997) Protein ScL 6, 80; G. G. Prive et al (1999) 
Protein Set 8, 1400). These were helical peptides that crystallised with their helical ends in 
contact so as to form pseudo-continuous helices in the solid state. In other words they formed 
"blunt-ended" arrangements. 

According to one aspect of the invention there is provided a protein structure comprising a 
plurality of first peptide monomer units arranged in a first strand and a plurality of second 
peptide monomer units arranged in a second strand, the strands preferably forming a coiled- 
coil structure, and in which a first peptide monomer unit in the first strand extends beyond a 
corresponding second peptide monomer unit in the second strand in the direction of the 
strands. The protein structures of the invention have numerous advantages. For example, 
relatively long protein fibres can be formed with little material - 1 ^1 of a 100 ^iM solution of 
the peptide monomers may provide enough material to form 10 m of fibre 50 nm thick. 

At least one charged amino acid residue of the first peptide monomer unit may be arranged to 
attract an oppositely-charged amino acid residue of the second peptide monomer unit. 
Preferably, the charged amino acid residue is in an end portion of the first peptide monomer 
unit, which extends beyond the corresponding second peptide monomer unit in the second 
strand. At least one strand may consist solely of first or second peptide monomer units 
respectively i.e homogenous strands. Heterologous strands are also contemplated. The 
peptide monomer units may comprise a repeating structural unit. Preferably, the repeating 
structural unit comprises a heptad repeat motif, having the pattern: 

hpphppp 
abedef g 

Preferably, the repeat may include isoleucine or asparagine at position a and leucine at 
position d. Other repeats (e.g hendecads - abedefghijk) and amino acid compositions may 
also be used (see WQ99/1 1774). 
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Fig. 7 is an electromicrograph showing fibres which have been derivatised through the 
inclusion of flurophores; and 

Fig. 8 shows amino acid sequences designed to form blunt-ended heterodimers. 
1) Peptide Design and Synthesis 

Various peptide monomer units were designed as described above. The monomers and 
capping peptides (designed to complement the sticky ends of the monomers so as to produce 
flush, or blunt ends and, so, arrest longitudinal fibre assembly) are set out in Table 1 : 
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Claims 

1. A protein structure comprising a plurality of first peptide monomer units arranged in a 
first strand and a plurality of second peptide monomer units arranged in a second strand 
in which a first peptide monomer unit in the first strand extends beyond a 
corresponding second peptide monomer unit in the second strand in the direction of the 
strands. 

2. A protein structure according to claim 1 in which the strands together form a coiled coil 
structure. 

3. A protein structure according to claim 1 or 2 in which at least one charged amino acid 
residue of a first peptide monomer unit is arranged to attract an oppositely-charged 
amino acid residue of a second peptide monomer unit. 

4. A protein structure according to claim 3 in which the charged amino acid residue is in 
an end portion of the first peptide monomer unit which extends beyond the 
corresponding second peptide monomer unit in the second strand. 

5. A polypeptide structure according to any preceding claim in which at least one strand 
consists solely of first or second peptide monomer units respectively. 

6. A protein structure according to any preceding claim in which the peptide monomer 
units comprise a repeating structural unit. 

7. A protein structure according to claim 6 in which the repeating structural unit 
comprises a heptad repeat motif (abcdefg). 

8. A protein structure according to claim 6 in which the repeating structural unit 
comprises a hendecad repeat motif (abcdefghijk) 
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9. A protein structure according to claim 6 having isoleucine or asparagine at position a 
and leucine at position d. 

10. A protein structure according to claim 6 having valine or leucine at positions a and d 
respectively. 

1 1. A protein structure according to any one of claims 7 to 10 having oppositely-charged or 
otherwise complementary residues at positions g and e of respective monomer units. 

12. A protein structure according to claim 1 1 in which the oppositely-charged residues are 
glutamic acid and lysine residues or asparagine and aspartic acid residues, or synthetic 
derivatives of these amino acid residues. 

13. A protein structure according to any preceding claim in which the structure is stabilised 
by pairs of asparagine, arginine, lysine or other complementary residues provided by 
corresponding first and second peptide monomer units. 

14. A protein structure according to any preceding claim which is arranged to form a 
tubular structure. 

15. A protein structure according to claim 14 in which the peptide monomer units are offset 
by two or more amino acid positions in sequence whereby the peptide monomer units 
form a cylinder. 

16. A protein structure according to any preceding claim in which the first and second 
peptide monomer units have the sequence: 

a) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-pl) and 

b) KI RALKAKNAHLKQE I AALEQE I AALEQ (SAF-p2) respectively; or 

c) KIAALKQKIAALKQEIDALEYENDALEQ (SAF-pl A) and 
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d) KI RALKWKNAHLKQEI AALEQE I AALEQ ( SAF-p2C) respectively; or 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-pl C) and 

f) KI RALKWKNAHLKQE I AALEQE I AALEQ (SAF-p2C) respectively. 

17. A peptide monomer unit for use in preparing a protein structure the peptide monomer 
unit having an amino acid sequence selected from: 

a) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-pl); 

b) KI RALKAKNAHLKQE I AALEQE I AALEQ (SAF-p2); 

c) KIAALKQKIAALKQEIDALEYENDALEQ (SAF-pl A); 

d) KI RALKWKNAHLKQE I AALEQE I AALEQ ( SAF-p2C); 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC) ; and 
d) KI RALKWKNAHLKQEI AALEQE I AALEQ (SAF-p2C). 

18. A protein structure or peptide monomer unit according to any preceding claim in which 
at least one amino acid residue is derivatised. 

19. A method of producing protein structures, the method comprising providing a mixture 
of first and second peptide monomer units which associate to form a protein structure 
according to any preceding claim. 

20. A method according to claim 19 in which the protein structure is derivatised. 

21. A method according to claim 19 or 29 in which the protein structure is stabilised by 
cross-linking. 
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*22. Protein fibres produced by an association of protein structures according to any one of 
claims 1 to 3 or a method according to claim 19, 20 or 21. 

23. A kit for making protein structures, the kit comprising first and second peptide 
monomer units which associate to form a protein structure according to any one of 
claims 1 to 13 or protein fibres according to claim 22. 



24. A two dimensional matrix comprising a protein structure according to any one of 
claims 1 to 13 or protein fibres according to claim 22. 

25. A three dimensional grid comprising a protein structure according to any one of claims 
1 to 13 or protein fibres according to claim 21. 

26. A matrix according to claim 25 which is arranged to assemble in solution. 

27. A matrix according to claim 25 or 26 which is arranged to bind a target entity. 

28. A matrix according to claim 27 which is arranged to bind viruses. 

29. A method of forming a matrix according to any one of claims 25 to 28 in which a 
mixture of separate first and second monomer units is provided and are then caused to 
associate to form a protein structure in accordance with the invention, an accumluation 
of such protein structures assembling in turn to form a three dimensional matrix. 

30. A method according to claim 29 in which the matrix is formed in situ. 

31. A method for controlling the production of a synthetic polymers comprising assembling 
a protein structure in accordance to any one of claims 1 to 16 in association with the 
polymer. 
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32. A method according to claim 31 in which the protein structure is removed after 
synthesis of the polymer. 

33. A tip for use in Atomic Force Microscopy comprising a protein structure according to 
any one of claims 1 to 16. 
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EXAMINATION REPORT 
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1. Basis of the rep rt 

1 . With regard to the I m nts of the international application (Replacement sheets which have been furnished to 
the receiving Office in response to an invitation under Article 14 are referred to in this report as "originally filed" 
and are not annexed to this report since they do not contain amendments (Rules 70. 16 and 70. 1 7)): 
Description, pages: 

1 ,2,4-7,9-24 as originally filed 

8 as received on 10/01/2001 with letter of 08/01/2001 

3,3a with telefax of 05/12/2001 

Claims, No.: 

1 -34 with telefax of 05/1 2/2001 

Drawings, sheets: 

1/1 0-1 0/1 0 as originally filed 

Sequence listing part of the description, pages: 

1-5, filed with the letter of 09.1 1 .2000 

2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item 

These elements were available or furnished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the international application in written form. 

□ filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority in written form. 

□ furnished subsequently to this Authority in computer readable form. 

G3 The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

IS The statement that the information recorded in computer readable form is identical to the written sequence 
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listing has been furnished. 



4. 



The amendments have resulted in the cancellation of: 



□ 



the description, 
the claims, 
the drawings, 



pages: 
Nos.: 



□ 



□ 



sheets: 



5. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this 
report.) 

6. Additional observations, if necessary: 
II. Priority 

1 . □ This report has been established as if no priority had been claimed due to the failure to furnish within the 

prescribed time limit the requested: 

□ copy of the earlier application whose priority has been claimed. 

□ translation of the earlier application whose priority has been claimed. 

2. H This report has been established as if no priority had been claimed due to the fact that the priority claim has 

been found invalid. 

Thus for the purposes of this report, the international filing date indicated above is considered to be the relevant 
date. 

3. Additional observations, if necessary: 
see separate sheet 

IV. Lack of unity of invention 

1 . In response to the invitation to restrict or pay additional fees the applicant has: 

□ restricted the claims. 

□ paid additional fees. 

□ paid additional fees under protest. 

□ neither restricted nor paid additional fees. 

2. □ This Authority found that the requirement of unity of invention is not complied and chose, according to Rule 

68.1 , not to invite the applicant to restrict or pay additional fees. 
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3. This Authority considers that the requirement of unity of invention in accordance with Rules 13.1 , 13.2 and 13.3 is 

□ complied with. 

EI not complied with for the following reasons: 
see separate sheet 

4. Consequently, the following parts of the international application were the subject of international preliminary 
. examination in establishing this report: 

H all parts. 

□ the parts relating to claims Nos. . 



V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 

1. Statement 

Novelty (N) Yes: Claims 1-34 

No: Claims 

Inventive step (IS) Yes: Claims 1-11, 14-15 and 17-25 

No: Claims 12-13,16, and 26-34 

Industrial applicability (IA) Yes: Claims 1-34 

No: Claims 



2. Citations and explanations 
see separate sheet 



VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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Reference is made to the following documents: 

D1: US-A-5 712 366 (KAPLAN DAVID L ET AL) 27 January 1998 (1998-01-27) 
D2: WO 96 1 1947 A (GOLDBERG EDWARD B) 25 April 1996 (1996-04-25) 
D3: KOJIMA SHUICHI ET AL: 'FIBRIL FORMATION BY AN AMPHIPATHIC ALPHA- 
HELIX-FORMING POLYPEPTIDE PRODUCED BY GENE ENGINEERING.' 
PROCEEDINGS OF THE JAPAN ACADEMY SERIES B PHYSICAL AND 
BIOLOGICAL, vol. 73, no. 1, 1997, pages 7-11,1997 
D4: W A PETKA ET AL: 'REVERSIBLE HYDROGELS FROM SELF-ASSEMBLING 
ARTIFICIAL PROTEINS' SCIENCE,AAAS. LANCASTER, PA,US, vol. 281, 17 
June 1998 (1998-06-17), pages 389-392 
D5: Biochemistry 39 (June/August 2000) 8728-34; intermediate document 

The document D5 was not cited in the international search report. A copy of the 
document has not been provided, as the document is known to the inventors. 

Re Item II 

Priority 

1 . The present application claims the priority date of the priority application 
GB9922013.9, designated P1 , having as filing date 17.09.1999. The subject- 
matter of P1 is more limited in scope than the internationally filed application. 

It is for example noted that present claims 12-13, 16 and 26-34, and examples 6 
and 8-9 are missing in the priority document. The subject-matter of said claims is 
therefore only entitled to the filing date of 18.09.2000, making e.g. D5 available for 
citing as a prior art document. 

Re Item IV 

Lack of unity of invention 

2. Given the partial lack of right to priority and the disclosure D5 (referring to the 
sticky-end assembly of a designed peptide fiber) it is considered that the subject- 
matter of e.g. claims 26-28 is not necessarily linked to that of claim 1 . 

A single general inventive concept (referred to in Rule 13 PCT and the PCT 
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Preliminary Examination Guidelines Ch.lll, 7) is therefore not recognisable in the 
absence of a common, special technical feature. 

Re Item V 

Reasoned statement under Rule 66.2(a)(ii) with regard to novelty, inventive step or 
industrial applicability; citations and explanations supporting such statement 

3. The present application concerns the assembly of designed peptides with sticky- 
ends, in contrast to supramolecular assemblies made up of units having blunt 
ends (Ogihara et al.). 

D1 is a patent publication (also considered to be the closest prior art) disclosing 
assemblies of dimers having also sticky-ends (see Figure 5 concerning the 
interaction between the oppositely sequences of Fig. 4A and 4B, Figures 6A and 
B, column 6 concerning recognition elements at the N- or C-termini to align all the 
dimers in a "head-to-tail" orientation within a growing fibril, and the claims 
concerning assemblies of coiled coils with charged heptad subunits). 

New claim 1 contains presently the additional features of originally filed claims 6-8 
and claim 1 3; it specifes that in the "a" position at least one pair of corresponding 
first and second monomer units a pair of Asn, Arg or Lys or other complementary 
residues is present to ensure that the first strand and the second strand form a 
staggered parallel heterodimer coiled coil structure. It is however noted that the 
peptide monomers specified in claim 15 all refer to Asn in one "a" position and that 
the the wording "or other complementary residue^' is not clearly defined. Novelty 
is therefore recognised (when at least or other complementary residues does not 
refer to hydrophobic residues) for claim 1 over D1 . 

4. The other relevant available documents are: 

D2 discloses in particular self-assemblies polypeptides to produce protein 
nanometer structures; this patent application does not relate to coiled coils made 
up of units with heptad repeats (D2 concerns B-sheets), but it does concern the 
tinkering with peptide sequences for better binding in the assemblies, e.g. see 
page 8 "Structural units" referring to rods (strands) having positively and 
negatively charged groups or protrusions built in for specific binding to other units 
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and figures 2-3 (in particular the closed brickwork sheets having overlapping 
surfaces). 

D3 is disclosing fibril formation (four helix bundle) by a-helix-forming polypeptide 
having Ala at a particular position in the heptad repeat motif; no reference is made 
to "sticky-ends". 

D4 concerns reversible hydrogels from self-assembling recombinant proteins 
containing terminal leucine zipper domains (comprising heptad repeats and 
forming coiled coils agggregates; protein 3 contains two helix units of 42 
residues). 

5. Inventive step: None of these documents is prejudicial to the novelty of claim 1 
or any other claim and novelty is therefore acknowledged (Article 33(2) PCT). The 
involvement of an inventive step (Article 33(3) PCT) for claims 1-11, 14-15 and 
17-25 entitled to the priority date of 17.09.1999 is also ackowleged, as no 
document alone or any combination of available documents suggested the 
forming of the protein structures as claimed. 

An inventive step is however not recognised for the claims not entitled to the 
priority date: D5 is in this case the closest prior art. 

The present application does not satisfy the criterion set forth in Article 33(3) PCT 
because the subject-matter of claim 16 does not involve an inventive step (Rule 
65(1 )(2) PCT): the feature of derivatization is known from the prior art (see e.g. 
WO 99/1 1774, also mentioned in the present description). 
D5 refers to the possibility of higher-order assemblies (page 8733, right column) 
as well as D1 (see the bottom of column 6). The combination of D5 with the 
teaching of e.g. D5 makes therefor the subject-matter of claims 12-13 and 26-31 
obvious to the skilled person. 

The subject-matter of claims 32-34 appears to be based on general knowledge of 
the skilled person, and is therefore also obvious to the skilled person. 

Re Item VIII 

Certain observations on the international application 

6. Claim 1 now refers to a protein structure with at least one pair of corresponding 
first and second monomer units ensuring that the first strand and the second 
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strand form a staggered parallel heterodimer coiled coil structure. 

Claim 2 has now in a depending claim the feature of original claim 1 concerning 
the extension of a first monomer unit in the first strand beyond a corresponding 
second monomer unit in the second strand in the direction of the strands. In view 
of the definition of staggered heterodimer at the bottom of page 6 of the 
description on file (of the published application WO 01/21646) it is considered that 
claim 2 is superfluous. 

7. Claim 1 is considered to miss an essential technical feature of the invention 
(Article 6 PCT): as mentioned above, the residue other than the normal 
hydrophobic residue in the "a" position should only be specified as Asn. In 
addition, it is not clear what the meaning is of "other complementary residues". 

8. Claim 3 refers also to an essential technical feature of the invention. Coiled coils 
are protein-folding motifs that direct and cement a wide variety of protein-protein 
interactions (see present description page 1, line 10). The present examples also 
refer all to coiled coils made up of monomer units comprising a charged heptad 
repeat motifs. The feature of claim 3 is therefore essential to the invention and 
should be incorporated into claim 1 (to form a staggered, parallel heterodimer). 
Article 6 of the PCT requires that all independent claims contain the essential 
technical feature(s) of the invention (see also Rule 6.3(b) PCT). 

9. It appears furthermore that units SAF-p1 and SAF-p2C are identical; claim 15 is 
therefore not concise (Article 6 PCT). 
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peptides at 100 p.M (a width value of fc V on the histogram includes all measurements made 
from %y-5) to .v"). 

Fig. 5 is a cartoon showing the possible anti-typic association of parallel helical peptides 
leading to a homo-oligomeric peptide nanotube. 

Fig. 6 is an x-ray diffraction pattern of an aligned protein fibre of the invention. 

Fig. 7 is an image from a confocal fluorescent microscope showing fibres which have been 
derivatised through the inclusion of flurophores; and 

Fig. 8 shows amino acid sequences designed to form blunt-ended heterodimers. 
1) Peptide Design and Synthesis 

Various peptide monomer units were designed as described above. The monomers and 
capping peptides (designed to complement the sticky ends of the monomers so as to produce 
flush, or blunt ends and, so, arrest longitudinal fibre assembly) are set out in Table 1 : 
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principle of using "Sticky ends" is well developed in molecular biology for assembling DNA 
(S. J. Palmer et al (1998) Nucleic Acids Res. 26 f 2560), and has been used to design intricate 
DNA crystals (E. Winfree et al (1998) Namre 394, 539). However, to our knowledge, our 
application of sticky end-directed molecular assembly to peptides is new; although we do note 
that head-to-tail packing of helices has been observed in recently solved crystal structures for 
two designer peptides (N. L. Ogihara et al (1997) Protein Sci. 6, 80; G. G. Prive et al (1999) 
Protein Set 8, 1400). These were helical peptides that crystallised with their helical ends in 
contact so as to form pseudo-continuous helices in the solid state, in other words they formed 
"blunt-ended" arrangements. 

US-A-5,712, 366 discloses self-assembling protein material but does not provide details of 
how to make a staggered parallel heterodimer. WO 96/11947 discloses protein nanostructures . 
based on bacteriophage T4 tail fiber proteins but does not disclose a staggered parallel 
heterodimer coiled coil structure. 

Pandya et al, Biochemistry, 29, 8728-34, 2000 (published after the priority date of the present 
application) does not disclose a method of making nanotubes and does not disclose a matrix 
comprising the protein structures of the present invention. 

According to one aspect of the invention there is provided a protein structure comprising a 
plurality of first peptide monomer units arranged in a first strand and a plurality of second 
peptide monomer units arranged in a second strand, the strands preferably forming a coiled- 
coil structure, and in which a first peptide monomer unit in the first strand extends beyond a 
corresponding second peptide monomer unit in the second strand in the direction of the 
strands. The protein structures of the invention have numerous advantages. For example, 
relatively long protein fibres can be formed with little material - 1 p.1 of -a 100 \M solution of 
the peptide monomers may provide enough material to form 10 m of fibre 50 nm thick. 

At least one charged amino acid residue of the first peptide monomer unit may be arranged to 
attract an oppositely-charged amino acid.. residue of the second peptide monomer unit. 
Preferably, the charged amino acid residue is -in an end portion of the first peptide monomer 
unit, which extends beyond the corresponding second 'peptide monomer unit in the second 
strand. At least one strand may consist solely of first or second peptide monomer units 
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respectively i.e homogenous strands. Heterologous strands are also contemplated. The 
peptide monomer units may comprise a repeating structural unit. Preferably, the repeating 
structural unit comprises a hep tad repeat motif, having the pattern: 

hpphppp 
abcdefg 

Preferably, the repeat may include isoleucine or asparagine at position a and leucine at 
position d. Other repeats (e.g hendecads - abcdefghijk) and amino acid compositions may 
also be used (see W099/11774). 
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MATERIALS FOR THE PRODUCTION OP 
MANOMETER STRUCTURES AMP USE THEREOF 

riBE p P y T^S IBSEBX1S& 
5 The present invention pertains to nanostructures , 

I.e., nanometer sized structures useful in the construction 
of microscopic and macroscopic structures. In particular, 
the present invention pertains to nanostructures based on 
bacteriophage T4 tail fiber proteins and variants thereof. 

10 

BACMBPTOP TP THE IHYEWTIPH 

While the strength of most metallic and ceramic 
based materials derives from the theoretical bonding 
strengths between their component molecules and crystallite 

i5 surfaces, it is significantly limited by flaws in their 
crystal or glass- like structures. These flaws are usually 
inherent in the raw materials themselves or developed during 
fabrication and are often expanded due to exposure to 
environmental stresses. 

2q The emerging field of nanotechnology has made the 

limitations of traditional materials more critical. The 
ability to design and produce very small structures (i.e., of 
nanometer dimensions) that can serve complex functions 
depends upon the use of appropriate materials that can be 

25 manipulated in predictable and reproducible ways, and that 
have the properties required for each novel application. 

Biological systems serve as a paradigm for 
sophisticated nanostructures. Living cells fabricate proteins 
and combine them into structures that are perfectly formed 

30 and can resist damage in their normal environment. In some 
cases, intricate structures are created by a process of 
self-assembly, the instructions for which are built into the 
component polypeptides. Finally, proteins are subject to 
proofreading processes that insure a high degree of quality 

35 control. 

Therefore, ther is a n ed in the art for methods 
and compositions that xploit th s unique features of 

- 1 - 
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proteins to form constituents of synth tic nanostructures . 
The need is to d sign mat rials wh s properties can be 
tailor d to suit the particular reguirem nts of 
nanometer-scale technology. Moreov r # sine th subunits of 
5 most macrostructur a 1 materials, ceramics, metals, fibers, 
etc., are based on the bonding of nanostructural subunits, 
the fabrication of appropriate subunits without flaws and of 
exact dimensions and uniformity should improve the strength 
and consistency of the macrostructur es because the surfaces 
10 are more regular and can interact more closely over an 
extended area than larger, more heterogeneous material. 

SUMMARY OF TOE INVENTION 

In one aspect, the present invention provides 

15 isolated protein building blocks for nanostructures, 

comprising modified tail fiber proteins of bacteriophage T4. 
The gp34, 36, and 37 proteins are modified in various ways to 
form novel rod structures with different properties. 
Specific internal peptide sequences may be deleted without 

20 affecting their ability to form dimers and associate with 
their natural tail fiber partners. Alternatively, they may 
be modified so that they: interact only with other modified, 
and not native, tail fiber partners; exhibit thermolabile 
interactions with their partners; or contain additional 

25 functional groups that enable them to interact with 
heterologous binding moieties. 

The present invention also encompasses fusion 
proteins that contain sequences from two or more different 
tail fiber proteins. The gp35 protein, which forms an angle 

90 joint, is modified so as to form average angles different 
from the natural average angle of 137° (±7°) or 156° (±12°) , 
and to exhibit thermolabile interactions with its partners. 

In another aspect, the present invention provides 
nanostructures comprising native and modified tail fiber 

35 proteins of bacteriophage T4. The nanostructures may be one* 
dimensional rods, two-dimensional polyg ns or open or closed 
sheets, or thr e -dimensional open cages or closed solids. 

- 2 - 
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BRIEF DESCRIPTION or Tifg PBAMIBgg 

Figures 1A and IB show a schematic representation 
of th T4 bacteriophage particle (Figure 1A) , and a schematic 
5 117 8entati ° n ° f T4 bact-rlo I*»g« tail fiber (Figure 

Figure 2 shows a schematic representation of a unit 

rod. 

Figures 3A-3D show schematic representations of: a 
one-dimensional multi-unit rod joined along the x axis 
10 (Figure 3A) ; closed simple sheets (Figure 3B) ; closed 
brickwork sheets (Figure 3C) ; and open brickwork sheets 
(Figure 3D) . 

Figure 4 shows a schematic representation of two 
units used to construct porous and solid sheets (top and 
IS bottom), which, when alternatively layered, produce a multi- 
tiered set of cages as shown. 

Figure 5 shows a schematic representation of an 
angled structure having an angle of 120 

Figure 6 shows the DNA sequence (SEQ ID MO:l) of 
20 genes 34, 35, 36, and 37 of bacteriophage T4. 

Figure 7 shows the amino acid sequences (shown in 
single-letter codes) of the gene products of genes 34 
(SEQ ID HO:2, ORFX SEQ ID MO:3), 35 (SEQ ID NO:4) , 36 
(SEQ ID NO: 5), and 37 (SEQ ID MO: 6) of bacteriophage T4. The 
amino acid sequences (bottom line of each pair) are aligned 
with the nucleotide sequences (top line of each pair.) it is 
noted that the deduced protein sequence of gene 35 (from NCBI 
database) is not believed to be accurate. 

Fi9UreS 8A " 8B ShOW a 8che "«tic representation of: 
30 the formation of a P37 dimer initiator from a molecule that 
self-assembles into a dimer (Figure 8A) ; and the formation of 
a P37 trimer initiator from a molecule that self-assembles 
into a trimer (Figure 8B) . 

Figure 9 shows a schematic representation of the 
35 formation of the polymer (P37-36)n with an initiator that is 
a self -assembling dimer. 
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DETAILED DggcRiPTiOM oi» tor tmvbwptqm 

All patents, patent applications and lit rature 
references cited in the sp cif ication ar her by incorporated 
by refer nee in th ir ntirety. In the case f 
5 inconsistencies, the present disclosure, including, 
definitions, will prevail. 

Although the invention is described in terms f 
bacteriophage T4 tail fiber proteins, it will be understood 
that the invention is also applicable to tail fiber proteins 
10 of other T-even-like phage, e.g., of the T4 family (e.g., T4, 
Tula, Tulb), and T2 family (T2, T6, K3, Ox2, Ml, etc.) 

PCTTMTTlOHflS 

"Nanostructures" are defined herein as structur s 
15 of different sizes and shapes that are assembled from 
nanometer- sized protein components. 

"Chimers" are defined herein as chimeric proteins 
in which at least the amino- and carboxy-terminal regions are 
derived from different original polypeptides, whether the 
20 original polypeptides are naturally occurring or have been 
modified by mutagenesis. 

"Homodimers" are defined herein as assemblies of 
two substantially identical protein subunits that form a 
defined three-dimensional structure. 
25 The designation "gp" denotes a monomer ic 

polypeptide, while the designation "P" denotes homooligomers • 
P34, P36, and P37 are presumably homodimers or homotrimers. 

An isolated polypeptide that "consists essentially 
of 99 a specified amino acid sequence is defined herein as a 
30 polypeptide having the specified sequence or a polypeptide 
that contains conservative substitutions within that 
sequence. Conservative substitutions, as those of ordinary 
skill in the art would understand, are ones in which an 
acidic residue is replaced by an acidic residue, a basic 
35 residue by a basic residue, or a hydr ph bic residue by a 
hydrophobic residue. Also ncompassed is a polyp ptid that 
lacks ne or more amino acids at either the amino terminus or 
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carboxy terminus, up to a total of five at either terminus, 
when the absence f the particular residues has no 
discemabl ef f ct n the structur r th function of th 
polypeptide in practicing the present invention. 
5 The present invention pertains to a new class of 

protein building blocks whose dimensions are measured in 
nanometers, which are useful in the construction of 
microscopic and macroscopic structures. Without wishing to 
be bound by theory, it is believed that the basic unit is a 

10 homodimer composed of two identical protein subunits having a 
cross -0 configuration, although a trimeric structure is also 
possible. Thus, as will be apparent, references to a 
"homodimer" or "dimerization" as used herein will in many 
instances be construed as also referring to a homotrimer or 

15 trimerization. These long, stiff, and stable rod-shaped 
units can assemble with other rods using coupling devices 
that can be attached genetically or in vitro. The ends of 
one rod may attach to different ends of other rods or similar 
rods. Variations in the length of the rods, in the angles of 

20 attachment, and in their flexibility characteristics permit 
differently-shaped structures to self -assemble in situ. In 
this manner the units can self-assemble into predetermined 
larger structures of one, two or three dimensions. The 
self-assembly can be staged to form structures of precise 

25 dimensions and uniform strength due to the flawless 

biological manufacture of the components. The rods can also 
be modified by genetic and chemical modifications to form 
predetermined specific attachment sites for other chemical 
entities, allowing the formation of complex structures. 

30 An important aspect of the present invention is 

that the protein units can be designed so that they comprise 
rods of different lengths, and can be further modified to 
include features that alter their surface properties in 
predetermined ways and/ or influence their ability to join 

35 with ther id ntical r diff r nt units. Furth rmor , the 
self-assembly capabilities can be expanded by producing 
chimeric proteins that combine the properties of two 
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diff r nt memb rs of this class. This design feature is 
achi v d by manipulating the structure of the genes ncoding 
thes pr teins. 

As detailed below, the compositions and methods of 
S the present invention take advantage of the properties of th 
natural proteins , i.e., the resulting structures are stiff, 
strong, stable in aqueous media, heat resistant, protease 
resistant, and can be rendered biodegradable. A large 
quantity of units can be fabricated easily in microorganisms. 

10 Furthermore, for ease of automation, large quantities of 
parts and subassemblies can be stored and used as needed. 

The sequences of the protein subunits are based on 
the components of the tail fiber of the T4 bacteriophage of 
E. coll. It will be understood that the principles and 

IS techniques can be applied to the tail fibers of other T-even 
phages, or other related bacteriophages that have similar 
tail and/or fiber structures. 

The structure of the T4 bacteriophage tail fiber 
(illustrated in Figure 1) can be represented schematically as 

20 follows (N~ amino terminus, O carboxy terminus): N[P34]C - 
N[gp35]C - N[P36]C - N[P37]C. P34, P36, and P37 are all 
stiff, rod-shaped protein homodimers in which two identical & 
sheets, oriented in the same direction, are fused 
face-to-face by hydrophobic interactions between the sheets 

25 juxtaposed with a 180° rotational axis of symmetry through 
the long axis of the rod. (The structure will vary if P34, 
P36, and P37 are homot rimers. ) gp35, by contrast, is a 
monomeric polypeptide that attaches specifically to the 
N-terminus of P36 and then to the C-terminus of P34 and forms 

30 an angle joint between two rods. During T4 infection of E. 
coll, two gp37 monomers dimerize to form a P37 homodimer; the 
process of dimerization is believed to initiate near the 
C-terminus of P37 and to require two E. coll chaperon 
proteins. (A variant gp37 with a temperature sensitive 

35 mutation n ar the C-terminus used in the present inv ntion 
requires only one chaperon, gp57, for dimerization.) Once 
dimeriz d, the N-terminus of P37 initiates the dimerization 
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of two gp36 mon mers to a P36 rod. Th joint betw n the 
C-terminus f P36 and th N-terminus f P37 is tight and 
stiff but noncovalent. The N- terminus of P36 then attaches 
to a gp35 monomer; this interaction stabilizes P36 and forms 
S the elbow of the tail fiber. Finally, gp35 attaches to the 
C-terminus of P34 (which uses gp57 for dimerization) . Thus, 
self assembly of the tail fiber is regulated by a 
predetermined order of interaction of specific subunits 
whereby structural maturation caused by formation of the 

10 first subassembly permits interaction with new (previously 
disallowed) subunits. This results in the production of a 
structure of exact specifications from a random mixture of 
the components. 

In accordance with the present invention, the genes 

IS encoding these proteins may be modified so as to make rods of 
different lengths with different combinations of ends. The 
properties of the native proteins are particularly 
advantageous in this regard. First, the 0-sheet is composed 
of antiparallel 0-strands with 0-bends at the left (L) and 

20 right (R) edges. Second, the amino acid side chains 

alternate up and down out of the plane of the sheet. The 
first property allows bends to be extended to form symmetric 
and specific attachment sites between the L and R surfaces, 
as well as to form attachment sites for other structures. In 

25 addition, the core sections of the /3-sheet can be shortened 
or lengthened by genetic manipulations e.g., by splicing DNA 
regions encoding 0-bends, on the same edge of the sheet, to 
form new bends that exclude intervening peptides, or by 
inserting segments of peptide in an analogous manner by 

30 splicing at bend angles. The second property allows amino 
acid side chains extending above and below the surface of the 
0-sheet to be modified by genetic substitution or chemical 
coupling. Importantly, all of the above modifications are 
achieved without compromising the structural integrity of the 

35 rod. It will be understood by one skilled in th art that 
thes properties allow a great deal of fl xibility in 
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designing units that can assemble into a broad vari ty of 
structures, son of which are d tail d below. 

grawrraitt jnraa 

S The rods of the present invention function like 

wooden 2X4 studs or steel beans for construction. In this 
case, the surfaces are exactly reproducible at the molecular 
level and thereby fitted for specific attachments to similar 
or different units rods at fixed joining sites. The surfaces 

10 are also modified to be more or less hydrophilic, including 
positively or negatively charged groups, and have protrusions 
built in for specific binding to other units or to an 
intermediate joint with two receptor sites. The surfaces of 
the rod and a schematic of the unit rod are illustrated in 

15 Figure 2. The three dimensions of the rod are defined as: x, 
for the back (B) to front (F) dimension; y, for the down (D) 
to up (U) dimension; and z, for the left (L) to right (R) 
dimension. 

One dimensional multi-unit rods can be most readily 

20 assembled from single unit rods joined along the x axis 

(Figure 3 A) but regular joining of subunits in either of th 
other two dimensions will also form a long structure, but 
with different cross sections than in the x dimension. 

Two dimensional constructs are sheets formed by 

25 interaction of rods along any two axes. 1) Closed simple 
sheets are formed from surfaces which overlap exactly, along 
any two axes (Figure 3B) . 2) glagffifl faciStafgEfc Sheet? are 
formed from interaction between units that have exactly 
overlapping surfaces in one dimension and a special type of 

30 overlap in the other (Figure 3C) . In this case there must b 
two different sets of complementary joints spaced with 
exactly 1/2 unit distance between them. If they are center d 
(i.e., each set 1/4 from the end) then each joint will be in 
the center of the units above and below. If they are offset, 

35 then the joint will be offs t as well. In this construction, 
the complementary interacting ites are schematized by * and 
**• If th int racting sit s are ach symmetric, the 
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alt mating rows can interact with the rods in either 
direction. If they are not symmetric, and can nly interact 
with interacting r ws facing in the same r opposit 
direction, the sheet will made of unidirectional rods or 
5 layers of rods in alternating directions. 3) Open bricirwnrw 
fih£S£& (or nets) result when the units are separated by more 
than one-half unit (Figure 3D) . The dimensions of the 
openings (or pores) depend upon the distance (dx) separating 
the interacting sites and the distance (dy) by which these 
10 sites separate the surfaces. 

Three dimensional constructs require sterically 
compatible interactions between all three surfaces to form 
solids, l) Closed gQlidft can assemble from units that 
overlap exactly in all three dimensions (e.g., the exact 
15 overlapping of closed simple sheets) . in an analogous 
nanner, closed brickwork sheets can form closed solids by 
overlapping sheets exactly or displaced to bring the 
brickwork into the third dimension. This requires an 
appropriate set of joints on all three pairs of parallel 
20 faces of the unit. 2) Porous B »n 1r are made by joining 
open brickwork sheets in various ways. For example, if the 
units overlap exactly in the third dimension, a solid is 
formed with the array of holes of exact dimensions running 
perpendicular to the plane of the paper. If instead, a 
25 material is needed with closed spaces, with layers of width 
dz (i.e., in the 1**>D dimension) , a simple closed sheet is 
layered on the open brickwork sheet to close the openings. 
If the overlap of the open brickwork sheet is e.g., 1/4 unit, 
then a rod of length 3/4 units is used to make the sheet. 
30 Joints are then needed in the 2 dimension. The two units 
used to polymerize these alternate layers, and the layers 
themselves, are schematized in Figure 4. 

All of the above structures are composed of simple 
linear rods. A second unit, the angle unit, expands the type 
35 and dimensi nality of possible structures. The angle unit 
connects two rods at angles diff rent from 180", akin to an 
angle ir n. The average angle and its degree of rigidity are 
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built int this conn ctor structure. F r exampl , the 
structure shown in Figure 5 has an angle of 120 and 
different sp cific j ining sites at a and at b. Th 
following are exampl s of structures that are form d 
S utilizing angle joints: 

1) open brickwork sheets are expanded and 
strengthened in the direction normal to the rod direction by 
adding angles perpendicular to the sheet. In this case, a 
three dimensional network forms. Attachment of 90° angles to 

10 the ends of the rods makes an angle almost in the plane of 
the sheet, allowing new rods added to those angles (which 
must have some play out of the plane of the original sheet to 
attach in the first place) to form a new sheet, almost 
parallel, with an orientation normal to its upper or lower 

15 neighbor. 

2) Hexagons are made from a mixture of rods and 
angle joints that form 120° angles. In this case, there are 
two exclusive sets of joints. Each set is made up of one f 
the two ends of the rod and one of the two complementary 

20 sites on the angle. This is a linear structure in the sense 
that the hexagon has a direction (either clockwise or 
counterclockwise) . It can be made into a two dimensional 
open net (i.e., a two dimensional honeycomb) by joining the 
sides of the hexagons. It can form hexagonal tubes by 

25 joining the top of the hexagon below to the bottom face of 
the hexagon above. If the tubes also join by their sides, 
they will form an open three dimensional multiple hexagonal 
tube. 

3) Helical hexagonal tubes are made analogously to 
30 hexagons but the sixth unit is not joined to the first to 

close the hexagon. Instead, the end is displaced from the 
plane of the hexagon and the seventh and further units are 
added to form a hexagonal tube which can be a spring if there 
is little or no adhesive force between the units of the 
35 helix, or a stiff rod if there is such a force to maintain 
the close pr ximity of app sing units. 
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It will be apparent to ne skilled in the art that 
the compositions and methods of the present invention also 
encompass ther p lyg nal structures such as ctagons, as 
veil as open solids such as tetrahedrons and icosahedrons 
S formed from triangles and boxes formed from squares and 
rectangles. The range of structures is limited only by the 
types of angle units and the substituents that can be 
engineered on the different axes of the rod units. For 
example, other naturally occurring angles are found in the 
10 fibers of bacteriophage T7, which has a 90° angle (Steven et 
al., J. Hoi. Biol. 200: 352-365, 1988). 

MsaiflH Mm maBsmaH or thb bob protects 

The protein subunits that are used to construct the 

15 nanostructures of the present invention are based on the four 
polypeptides that comprise the tail fibers of bacteriophage 
T4, I.e., gp34, gp35, gp36 and gp37. The genes encoding 
these proteins have been cloned, and their DNA and protein 
sequences have been determined (for gene 36 and 37 see Oliver 

20 et al. J. Hoi. Biol. 153: 545-568, 1981). The DNA and amino 
acid sequences of genes 34, 35, 36 and 37 are set forth in 
Figures 6 and 7 below. 

Gp34, gp35, gp36, and gp37 are produced naturally 
following infection of E. coli cells by intact T4 phage 

25 particles. Following synthesis in the cytoplasm of the 
bacterial cell, the gp34, 36, and 37 monomers form 
homodimers, which are competent for assembly into maturing 
phage particles. Thus, E. coli serves as an efficient and 
convenient factory for synthesis and dimerization of the 

30 protein subunits described herein below. 

In practicing the present invention, the genes 
encoding the proteins of interest (native, modified, or 
recombined) are incorporated into DNA expression vectors that 
are well known in the art. These circular plasmids typically 

35 contain selectable marker g nes (usually conf rring 

antibiotic resistance to transformed bacteria) , sequences 
that allow replication of the plasmid to high copy number in 
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E. coll, and a multiple cloning site immediately d wnstream 
of an inducible promoter and ribosome binding site. Examples 

f c mmercially available v ctors suitable f r use in the 
present invention include the pET system (Novagen, Inc. , 
5 Madison, WI) and Superlinker vectors pSE280 and pSE380 
(Invitrogen, San Diego, CA) . 

The strategy is to 1) construct the gene of 
interest and clone it into the multiple cloning site; 2) 
transform E. coli cells with the recombinant plasmid; 3) 

10 induce the expression of the cloned gene; 4) test for 

synthesis of the protein product; and, finally, 5) test for 
the formation of functional homodimers. In some cases, 
additional genes are also cloned into the same plasmid, when 
their function is required for dimerization of the protein of 

IS interest. For example, when wild-type or modified versions 
of gp37 are expressed, the bacterial chaperon gene 57 is also 
included; when wild-type or modified gp36 is expressed, the 
wild-type version or a modified version of the gp37 gene is . 
included. The modified gp37 should have the capacity to 

20 dimerize and contain an N-terminus that can chaperon the 
dimerization of gp36. This method allows the formation of 
monomer ic gene products and, in some cases, maturation of 
monomers to homodimeric rods in the absence of other 
phage- induced proteins normally present in a T4-infected 

25 cell. 

Steps 1-4 of the above-defined strategy are 
achieved by methods that are well known in the art of 
recombinant DNA technology and protein expression in 
bacteria. For example, in step l, restriction enzyme 

30 cleavage at multiple sites, followed by ligation of 

fragments, is used to construct deletions in the internal rod 
segment of gp34, 36, and 37 (see Example 1 below). 
Alternatively, a single or multiple restriction enzyme 
cleavage, followed by exonuclease digestion (EXO-SIZE, New 

35 England Biolabs, Beverly, MA) , is used to d lete DNA 

sequences in one or both directions from th initial cleavage 
site; when combined with a subsequent ligation step, this 
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procedure produces a nested s t f deletions of increasing 
sizes. Similarly, standard m thods are used to r combine DNA 
segments from two different tail fiber genes, to produc 
chimeric genes encoding fusion proteins (called "chimera" in 
S this description) • In general, this last method is used to 
provide alternate N- or C-termini and thus create novel 
combinations of ends that enable new patterns of joining of 
different rod segments. A representative of this type of 
chimer, the fusion of gp37-36, is described in Example 2. 

10 The preferred hosts for production of these proteins (Step 2) 
is B. eoli strain BL21(DE3) and BL21(DE3/pLysS) (available 
commercially from Novagen, Madison, WI) , although other 
compatible recA strains, such as HMS174(DE3) and 
HMS174(DE3/pLysS) can be used. Transformation with the 

IS recombinant plasmid (Step 2) is accomplished by standard 

methods (Sambrook, J., Molecular Cloning, Cold Spring Harbor 
Laboratories, Cold Spring Harbor, HY; this is also the sourc 
for standard recombinant DNA methods used in this invention.) 
Transformed bacteria are selected by virtue of their 

20 resistance to antibiotics e.g., ampicillin or kanamycin. The 
method by which expression of the cloned tail fiber genes is 
induced (Step 3) depends upon the particular promoter used. 
A preferred promoter is plac (with a laci* on the vector to 
reduce background expression) , which can be regulated by the 

25 addition of isopropylthiogalactoside (IPTG) • A second 
preferred promoter is pT7$lo, which is specific to T7 RNA 
polymerase and is not recognized by E. coli RNA polymerase. 
T7 RNA polymerase, which is resistant to rifamycin, is 
encoded on the defective lambda OE lysogen in the E. coli 

30 BL21 chromosome. T7 polymerase in BL21(DE3) is 

super-repressed by the laci* gene in the plasmid and is 
induced and regulated by IPTG. 

Typically, a culture of transformed bacteria is 
incubated with the inducer for a period of hours, during 

35 which th synthesis of the prot in of interest is monitored. 
In the present instance, extracts of the bacterial cells are 
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prepared, and th T4 tail fiber proteins are detect d, for 
example, by SDS-polyacrylamide g 1 electrophoresis. 

One th modifi d pr tein is detected in bacterial 
extracts, it is nec ssary to ascertain whether r not it 
S forms appropriate homodimers (Step 4) . This is accomplished 
initially by testing whether the protein is recognized by an 
antiserum specific to the mature dimerized form of the 
protein. 

Tail fiber-specific antisera are prepared as 

10 described (Edgar, R.S. and Lielausis, I., Genetics 52: 1187, 
1965; Ward et al, J. Mol. Biol. 54:15, 1970). Briefly, whol 
T4 phage are used as an immunogen; optionally, the resulting 
antiserum is then adsorbed with tail-less phage particles, 
thus removing all antibodies except those directed against 

IS the tail fiber proteins. In a subsequent step, different 
aliquots of the antiserum are adsorbed individually with 
extracts that each lack a particular tail fiber protein. For 
example, if an extract containing only tail fiber components 
P34, gp35, and gp36 (derived from a cell infected with a 

20 mutant T4 lacking a functional gp37 gene) is used for 
absorption, the resulting antiserum will recognize only 
mature P37 and dimerized P36-P37. A similar approach may b 
used to prepare individual antisera that recognize only 
mature (i.e., homodimerized) P34 and P36 by adsorbing with 

25 extracts containing distal half tail fibers or P34, gp35 and 
P37, respectively. An alternative is to raise antibody 
against purified tail fiber halves, e.g., P34 and 
gp35-P36-P37. Anti gp35-P36-P37 can then be adsorbed with 
P36-P37 to produce anti-gp35, and anti-P36 can be produced by 

30 adsorption with P37 and gp35. Anti-P37, anti-gp35, and anti- 
PS 4 can also be produced directly by using purified P37, 
gp35, and P34 as immunogens. Another approach is to raise 
specific monoclonal antibodies against the different tail 
fiber components or segments thereof. 

35 Specific antibodies to subunits or tail parts are 

us d in any of the following ways to det ct appropriately 
homodimerized tail f ib r proteins: l) Bacterial colonies are 
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screened f r those expressing nature tail fiber proteins by 
directly transferring the colonies, or, alternatively, 
samples f lysed or unlysed cultures, to nitrocellulose 
filters, lysing the bacterial cells on the filter if 
5 necessary, and incubating with specific antibodies. 

Formation of immune complexes is then detected by methods 
widely used in the art (e.g., secondary antibody conjugated 
to a chromogenic enzyme or radiolabeled Staphylococcal 
Protein A.). This method is particularly useful to screen 
10 large numbers of colonies e.g., those produced by EXO-SIZE 
deletion as described above. 2) Bacterial cells expressing 
the protein of interest are first metabolically labelled with 

S-methionine, followed by preparation of extracts and 
incubation with the antiserum. The immune complexes are then 
15 recovered by incubation with immobilized Protein A followed 
by centrifugation, after which they may be resolved by 
SDS-polyacrylamide gel electrophoresis. 

An alternative competitive assay for testing 
whether internally deleted tail fiber proteins that do not 
10 permit phage infection nonetheless retain the ability to 
dimerize and associate with their appropriate partners 
utilizes an in vitro, complementation system, i) A bacterial 
extract containing the modified protein of interest, as 
described above, is mixed with a second extract prepared from 
« cells infected with a T4 phage that is mutant in the gene of 
interest. 2) After several hours of incubation, a third 
«*tract is added that contains the wild-type version of the 
protein being tested, and incubation is continued for several 
additional hours. 3) Finally, the extract is titered for 
30 infectious phage particles by infecting B. coli and 

quantifying the phage plaques that result. A modified tail 
riber protein that is correctly dimerized and able to join 
with its partners is incorporated into tail fibers in a 
non-functional manner in Step i, thereby preventing the 
35 incorporation f the wild-type version of the protein in St p 
2; th result is a redaction in the titer of the resulting 
Phage sample. By contrast, if the modified pr tein is unable 
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t dimerize and thus form proper N- and/or c-termini, it will 
not be inc rporated into phage particl s in St p 1, and thus 
will n t compet with assembly f intact phag particl s in 
Step 2; the phage titer sh uld thus be equivalent to that 
S observed when no modified protein is added in Step 1 (a 
negative control.) 

Another way in which to test whether chimera and 
internally deleted tail fiber proteins retain the ability to 
dimerize and associate with their appropriate partners is 

10 done in vivo. The assay detects the ability of such chimers 
and deleted proteins to compete with normal phage parts for 
assembly, thus reducing the burst size of a wild-type phag 
infecting the same host cell in which the chimers or deleted 
proteins are recombinant ly expressed. Thus, expression from 

15 an expression vector encoding the chimer or deleted protein 
is induced inside a cell, which cell is then infected by a 
wild-type phage. Inhibition of wild-type phage production 
demonstrates the ability of the recombinant chimer or protein 
to associate with the appropriate tail fiber proteins of the 

20 phage. 

The above-described methods are used, alone and in 
combination, in the design and production of different types 
of modified tail fiber proteins. For example, a preliminary 
screen of a large number of bacterial colonies for those 

25 expressing a properly dimerized protein will identify 

positive colonies, which can then be individually tested by 
in vitro complementation. 

Non-limiting examples of novel proteins that are 
encompassed by the present invention include: 

30 i) Internally deleted gp34, 36, and 37 

polypeptides (See Example 1 below) ; 

2) A C-terminally truncated gp36 fused to the N- 
terminus of N-terminally truncated gp37; 

3) A fusion between gp36 and gp37 in which gp37 is 
35 N-terminal to gp36 (i.e., in reverse of the natural rder) , 

termed her in "gp37-36 chimer" (See Example 2 bel w) ; 
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4) A fusion between gp34 and gp36 in which gp36 is 
H-terainal to gp34 (i.e., in reverse of the natural order) 
termed herein "gp36-34 chimer"; 

5) A variant of gp36 in which the C-terainus is 

5 mutated such that it lacks the capability to interact with 

(and diaerize in response to) the N-terainus of wild-type 
*37, tented herein "gp36*«; 

6) A variant of gp37 in which the N-tenninus is 
autated such that it forms a P37 that lacks the capability to 
tZT,T W±th ^ C - trai ™» <* ^Id-type gp36, termed herein 

7) Variants of gp 3 6* and *P37 that can interact 
with each other, but not with gp36 or P37. 

, 8) A variant "P37-36 chimer- in which the gp36 

15 moiety is derived from the variant as in 5), i.e., -P37-36*-. 
(For 5-8, See Example 3 below.) 

9) A variant -P37-36 chiaer" in which the gp37 

"♦P37-36" der±Ved tr<m Variant aS ^ 6) ab ° Ve ' i - 6 -' 

20 10) A variant P37-36 chimer, *P37-P36*, in which 

the gp36 and gp37 moieties are derived from the variants in 

11) A fusion botwoen gp 3 6 and gp34 in which gp36 
-«ju«.ce« «. pi ac . d H-terminai to gp34, the «i»er of which 

« is termed herein "P36-34 chimer"; 

12) Variants of gp 35 that form average angles 
different from 137- or 158- (the native angle) e.g., less 
than about 125* or more than about 145* under conditions 
wherein the wild-type gp 35 protein forms an angle of 137- 

30 when combined with the P34 and P36-P37 diaers, and/or exhibit 
»ore or less flexibility than the native polypeptide; 

tw. , ^\ VariantS ° f ** 34 ' 35 ' 3« and 37 that exhibit 
theraolabile interactions or other variant specific 
interactions with their cognate partn rs; and 
35 14 > Variants of gp 3 7 in which the C-terminal 

doaaxn f the polypeptide is aodified to include sequences 
that c nfer specific binding properties on the entir 
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mol cul , e.g., sequenc s derived fr m avidin that recognize 
biotin, sequences derived from immunoglobulin heavy chain 
that recogniz Staphylococcal A protein, sequences d rived 
fr m the Fab porti n £ th heavy chain of monocl nal 
5 antibodies to which their respective Fab light chain 

counterparts could attach and form an antigen-binding site, 
immunoactive sequences that recognize specific antibodies , or 
sequences that bind specific metal ions. These ligands may 
be immobilized to facilitate purification and/ or assembly. 

10 In specific embodiments, the chimers of the 

invention comprise a portion consisting of at least the first 
10 (N-terminal) amino acids of a first tail fiber protein 
fused via a peptide bond to a portion consisting of at least 
the last 10 (C- terminal) amino acids of a second tail fiber 

IS protein. The first and second tail fiber proteins can be the 
same or different proteins. In another embodiment, the 
chimers comprise an amino acid portion in the range of the 
first 10-60 amino acids from a tail fiber protein fused to an 
amino acid portion in the range of the last 10-60 amino acids 

20 from a second tail fiber protein. In another embodiment, 
each amino acid portion is at least 20 amino acids of the 
tail fiber protein. The chimers comprise portions, i.e., not 
full-length tail fiber proteins, fused to one another. In a 
preferred aspect, the first tail fiber protein portion of the 

25 chimer is from gp37, and the second tail fiber protein 

portion is from gp36. Such a chimer (gp37-36 chimer), after 
oligomerization to form P37-36, can polymerize to other 
identical oligomers. A gp36-34 chimer, after oligomerization 
to form P36-34, can bind to gp35, and this unit can then 

30 polymerize. In another embodiment, the first portion is from 
gp37, and the second portion is from gp34. In a preferred 
aspect, the chimers of the invention are made by insertions 
or deletions within a 0 turn of the fi structure of the tail 
fiber proteins. Most preferably, insertions into a tail 

35 fiber sequence, r fusing to anoth r tail fib r protein 

sequenc , (pr f erably via manipulation at the recombinant DNA 
level to produce th d sir d encoded protein) is done s that 
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sequences in 0 turns n the same edge of the 0-she t are 
j ined. 

In addition to the above-d scribed chimers, 
nanostructures of the invention can also comprise tail fiber 
S protein deletion constructs that are truncated at one end, 
e.g., are lacking an amino- or car boxy- end (of at least 5 or 
10 amino acids) of the molecule, such molecules truncated at 
the amino- terminus, e.g., of truncated gp37, gp34, or gp36, 
can be used to -cap" a nanostructure, since, once 
10 incorporated, they will terminate polymerization. Such 
molecules preferably comprise a fragment of a tail fiber 
protein lacking at least the first io, 20, or 60 amino 
terminal amino acids. 

In order to change the length of the rod component 
IS proteins as desired, portions of the same or different tail 
fiber proteins can be inserted into a tail fiber chimer to 
lengthen the rod, or be deleted from a chimer, to shorten the 
rod. 



20 MBBgLY OF INDIVIDUAL son gn^nw^pg tmtq mamnflrapcrnPM 

Expression of the proteins of the present invention 
in B. coli as described above results in the synthesis of 
large quantities of protein, and allows the simultaneous 
expression and assembly of different components in the same 

25 cells. The methods for scale-up of recombinant protein 

production are straightforward and widely known in the art, 
and many standard protocols can be used to recover native and 
modified tail fiber proteins from a bacterial culture. 

In a preferred embodiment, native (nonrecombinant) 

30 gp35 is isolated for use by growing up a bacteriophage T4 
having an amber mutation in gene 36, in a su° bacterial 
strain (not an amber suppressor), and isolating gp35 from the 
resulting culture by standard methods. 

P34, P36-P37, P37, and chimers derived from them 

35 are purified fr m B. coli cultures as mature dimers. Gp35 and 
variants thereof are purified as monomers. Purification is 
achi ved by the following procedur s or combinati ns thereof, 
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using standard methods: 1) chromatography on molecular 
sieve, ion* xchang , and/or hydrophobic matrices; 
2) preparative ultrac ntrifugati n; and 3) affinity 
chromatography, using as the immobilized ligand specific 
S antibodies or other specific binding moieties. For example, 
the C- terminal domain of P37 binds to the lipopolysaccharide 
of E. coll B. Other T4-like phages have P37 analogues that 
bind other cell surface components such as OmpF or TSX 
protein. Alternatively, if the proteins have been engineered 

10 to include heterologous domains that act as ligand s or 

binding sites, the cognate partner is immobilized on a solid 
matrix and used in affinity purification. For example, such 
a heterologous domain can be biotin, which binds to a 
streptavidin-coated solid phase. 

15 Alternatively, several components are co-expressed 

in the same bacterial cells, and sub-assemblies of larger 
nanostructures are purified subsequent to limited in vivo 
assembly, using the methods enumerated above. 

The purified components are then combined in vitro 

20 under conditions where assembly of the desired nanostructure 
occurs at temperatures between about 4°C and about 37 °C, and 
at pHs between about 5 and about 9. For a given 
nanostructure, optimal conditions for assembly (i.e., type 
and concentration of salts and metal ions) are easily 

25 determined by routine experimentation, such as by changing 
each variable individually and monitoring formation of the 
appropriate products. 

Alternatively, one or more crude bacterial extracts 
may be prepared, mixed, and assembly reactions allowed to 

30 proceed prior to purification. 

In some cases, one or more purified components 
assemble spontaneously into the desired structure, without 
the necessity for initiators. In other cases, an initiator 
is required to nucleate the polymerization of rods or sheets. 

35 This offers th advantage of localizing the assembly pr cess 
(i.e., if the initiator is imm bilized or therwis 
localized) and of regulating the dimensions of the final 
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structur . P r exampl , rod c mp nents that contain a 
functional P36 C-terminus require a functional P37 N-terminus 
to initiate rod formation st ichi metrically; thus r altering 
the relative amount of initiator and rod component will 
S influence the average length of rod polymer, if the ratio is 
n, the average rod will be approximately 
(P37-36)n — N-terminus P37-P37 C-terminus. 

In still other cases, the final nanostructure is 
composed of two or more components that cannot self-assemble 
10 individually but only in combination with each other, in 
this situation, alternating cycles of assembly can be staged 
to produce final products of precisely defined structure (see 
Example 6B below.) 

When an immobilized initiator is used, it may be 
15 desirable to remove the polymerized unit from the matrix 
after staged assembly. Por this purpose specialized 
initiators are engineered so that the interaction with the 
first rod component is rendered reversibly thermolabile (see 
Example 5 below) . in this way, the polymer can be easily 
20 separated from the matrix-bound initiator, thereby 

permitting: 1) easy preparation of stock solutions of uniform 
parts or subassemblies, and 2) re-use of the matrix-bound 
initiator for multiple cycles of polymer initiation, growth, 
and release. 

25 In an embodiment in which a nanostructure is 

assembled that is attached to a solid matrix via gp34 (or 
P34) , one way in which to detach the nanostructure to bring 
it into solution is to use a mutant (thermolabile) gp 3 4 that 
can be made to detach upon exposure to a higher temperature 

30 (e.g., 40«C). Such a mutant gp34, termed T4 tsB45, having a 
nutation at its C-terminal end such that P34 attaches to the 
distal tail fiber half at 30*c but can be separated from it 
in vitro by incubation at 40*c in the presence of 1% SDS 
(unlike wild-type T4 which are stable under these 

35 conditions), has be n r ported (Seed, i960, studies f the 
Bacteriophage T4 Proximal Half Tail Fiber, Ph.D. Thesis, 
California Institute of Technology) , and can be used. 
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Proteins which catalyze the £ rotation of corr ct 
(lowest energy) stable secondary (2°) structure of proteins 
are call d chap r n prot ins* (Oft n, especially in 
gl bular proteins, this stabilizati n is aided by t rtiary 
5 structure, e.g., stabilization of 0-sheets by their 

interaction in 0-barrels or by interaction with a-helices) . 
Normally chaperonins prevent intrachain or interchain 
interactions which would produce untoward metastable folding 
intermediates and prevent or delay proper folding. There are 

10 two known accessory proteins, gp57 and gp38, in the 

morphogenesis of T4 phage tail fibers which are sometimes 
called chaperonins because they are essential for proper 
maturation of the protein oligomers but are not present in 
the final structures. 

IS The usual chaperonin system (e.g., groEL/ES) 

interact with certain oligopeptide moieties of the gene 
product to prevent unwanted interactions with oligopeptide 
moieties elsewhere on the same polypeptide or another 
peptide. These would form metastable folding intermediates 

20 which retard or prevent proper folding of the polypeptide to 
its native (lower energy) state. 

Gp57, probably in conjunction with some membrane 
protein (s) , has the role of juxtaposing (and aligning) and/ r 
initiating the folding of 2 or 3 identical gp37 molecules. 

25 The aligned peptides then zip up (while mutually stabilizing 
their nascent ^-structures) to form a beam, without further 
interaction with gp57. 6p57 acts in T4 assembly not only for 
oligomerization of gp37 but also for gp34 and gpl2. 

30 STRUCTURAL COMPONENTS FOR SELP ASSEMBLY OF BEAMS IN VITRO 

Alternatively to starting the polymerization of 
chimers with the use of a preformed chimeric or natural 
oligomer ic unit called an initiator produced in vivo, 
molecules (preferably peptides) that can self -assemble can b 
35 produced as fusion pr t ins, fused to the N- or c-terminus f 
tail fiber variants of the invention (chimers, 
deleti n/ins rtion constructs) t align their ends and thus 
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to facilitate th.ir subsequent unaided folding into 

!»1r^ C ' ' tabl * r0d - llk -it. in vitro. 

^ !"* ^~\ t *: " <=»-P«o»in protein^ 

(e.g., op57) end h st cell .enbrane proteins. 

* initio *! *" illu,,trati » n ' consider the p 37 ^t „ 

£r^r W37 " 36 0li ' Meri " ti0 " »" d Polarisation. 

: m f ° Win9 °' 91,37 t0 » initiator require. 

jTZ* 'lT r Ca11 BMbrane ' Md *"° «*-P"o« Proteins. 
9P38 end qpST. In « preferred eebodinent, the need for oo38 

r 7 V'"" * °' * ■ Ut » tl ° n ' <» duplicate 

!nich jU8t *~— »' transition sone of gp37> 

which suppresses ,«„. 38 (Wood. W.B., F.A. ELerling and^ A 
Crowthar. 1H4, -Long Tell Fiber.: C.nes, Proteins. 
Structure, end Assembly,- in Mol^,,.,- ,. r l f 

ST" T 1> (JiB °- Mit ~> *« r i-« society for 

^crobioloqy. Weehington, D.c. pp 282-290). „ . ^ 

of ^« ! ' " 01ety " ) 1S fUSed to a c-ter»inal deletion 

2. ZJ£l< d ~ Mt f e ~ or »P-trean of the transition region (the 

TT-TZ Lu°f b 8 ' conserve,i 17 Mlno acid "'ion 

orotl, flb " r Protein, where the structure of the 

protein narrows to a thin fiber, .„ Henning et .1., 1994 

ZZZZ l ^T^T ^ T - V "- tWB ooliphege... to 
IS Society for ? f BRC * ,r1 ™ hmft T1 ' (ad.), American 

.t "l ^ ° 9Y ' " Mhi "9 t °». PP- 291-298; Wood 

«t .1., 19 94 , -Lon, tail fibers Gen «, proteins, structure 

Tc 'r ' 80 ° 1 * ty f ° r "iorobiology, Was hing ton. 

gP37 LntL °" 90 "~ i " ln P»rall.l end thus align the fused 
9P« peptides, pereitting then to fold in vitro, in the 
absence of other chaperonin proteins. 

»oiet„ Z l P3? 18 " dl "" r (Fi9Ure 8A > ' th " "If-aese^ling 
„ n « "" Sizing peptide such a. the leucin 

" * lpper < Md ' «ro» residues 250-281 ,ro» the yeast 

tr»«ripti „ factor, GCH4 (E.K. CShee. P.RutkowsKi and p. s . 

Ki». Sconce 2«3 = S38, 1989, or the self di»erising outent 
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1 ucine zipper peptid , pIL in which the a positions are 
substituted with isol ucin and the d positions with 1 ucine 
(Harbury P.B., T. Zhang, P.S. Kin and T. Alper. 1993. A 
Switch Between Two*, Three-, and F ur-Stranded Coil d C ils 
5 in GGN4 Leucine Zipper Mutants* Science, 262:1401-1407). If 
P37 is a trimer (Figure 8B) , the self-assembling moiety can 
be a self trimerizing mutant leucine zipper peptide, pll in 
which both the a and d positions are substituted with 
isoleucine (Harbury P.B., et al. ibid)* Alternatively, a 

10 collagen peptide can be used as the self -assembling moiety, 
such as that described by Bella et al. (J. Bella, M. Eaton, 
B. Brodsky and H.M. Berman. 1994. Crystal and Molecular 
Structure of a Collagen-Like Peptide at 1.9& Resolution. 
Science, 226:75-81), which self aligns by an inserted 

15 specific non repeating alanine residue near the center. 

Self-assembling moieties can be used to make 
initiators for polymerizations in the absence of the normal 
initiators. For example, to create an initiator for 
oligomer ization and polymerization of the chimeric monomer, 

20 gp37-36, gp37-36-Ca can be used as illustrated in Figure 9. 
(Cj means that a dimer forming peptide is fused to the 
C-terminus of the gp36 moiety. This is used if the beam is a 
dimer ic structure. Otherwise C3 — a trimer forming peptide 
fused to the C-terminus — would be used.) Furthermore, use 

25 of the E. coli lac repressor N- terminus, e.g., which 
associates as a tetramer, with two coils facing in each 
direction could join two dimers (or polymers of dimer s) end 
to end, either at their N- or C-termini depending upon which 
end the self -assembling peptides were placed. They could 

30 also join N- to C- termini. In any case, alone, they could 
only form a dimer, each end of which would be extensible by 
adding an appropriate chimer monomer (as shown for the 
simpler case in Figure 9) . 

In an alternative embodiment, the self-assembling 

35 moiety can be fused to th N-termini of the chimer. In a 
specific embodiment, th self -assembling moiety is fus d to 
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at least a 10 amino acid portion of a T-even-lik tail fiber 
prot in. 

A 8 If assembling moi ty that assembles into a 
heteroligomer can also be used. For example, if 
S polymerization between beams is directed by the surface of a 
dimeric cross-0 surface, addition of a heterodimeric unit 
with one surface which does not promote further 
polymerization would be very useful to cap the penultimate 
unit and thus terminate polymerization. If the two types of 

10 coiled regions of the self-assembling moiety are much more 
attractive to each other that to themselves, then all of the 
dimers will be heterodimers . Such is the case for the 
N-terminal Jun and Fos leucine zipper regions. 

A further advantage to such heterodimeric units is 

15 the ability to stage polymerization and thus build one unit 
(or one surface in a 2D array) at a time. For example, 
suppose surface A attaches to B but neither attaches to 
itself ([A<->B] is used to symbolize this type of 
interaction) • Mix A/A and B/B 0 (B 0 is attached to a matrix 

20 for easy purification) . This will form B 0 /B-A/A. Now wash 
out A/A and add B/B. The construct is now B c /B-A/A-B/B. Now 
add A/Ao. The construct is now B c /B-A/A-B/B-A/Ao and no more 
beams can be added. There are of course many other 
possibilities. 

25 

mitismfflgg 

The uses of the nanostructures of the present 
invention are manifold and include applications that require 
highly regular, well-defined arrays of fibers, cages, or 

20 solids, which may include specific attachment sites that 
allow them to associate with other materials. 

In one embodiment, a three-dimensional hexagonal 
array of tubes is used as a molecular sieve or filter , 
providing regular vertical pores of precise diameter for 

25 sel ctive s paration f particles by size. Such filters can 
be used f r sterilization of s lutions (i.e., to remove 
microorganisms or viruses) , or as a series of 
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molecular-weight cut-off filters, in this cas , th protein 
c mponents of the pores may be modified so as to provide 
specific surface pr perties (i. . , hydrophilicity or 
hydr ph bicity, ability to bind specific ligands, etc.) . 
S Among the advantages of this type of filtration device is th 
uniformity and linearity of pores and the high pore to matrix 
ratio. 

In another embodiment, long one-dimensional fibers 
are incorporated, for example, into paper or cement or 

10 plastic during manufacture to provide added vet and dry 
tensile strength. 

In still another embodiment, different 
nanostructure arrays are impregnated into paper and fabric as 
anti- counterfeiting markers. In this case, a simple 

15 color-linked antibody reaction (such as those commercially 
available in kits) is used to verify the origin of the 
material. Alternatively, such nanostructure arrays could 
bind dyes or other substances, either before or after 
incorporation to color the paper or fabrics or modify their 

20 appearance or properties in other ways. 

KITS 

The invention also provides kits for making 
nanostructures, comprising in one or more containers the 
25 chimers and deletion constructs of the invention. For 
example, one such kit comprises in one or more containers 
purified gp35 and purified gp36-34 chimer. Another such kit 
comprises purified gp37-36 chimer. 

The following examples are intended to illustrate 
30 the present invention without limiting its scope. 

In the examples below, all restriction enzymes, 
nucleases, ligases, etc. are commercially available from 
numerous commercial sources, such as New England Biolabs 
(NEB) , Beverly, MA; Life Technologies ( GIBCO-BRL ) , 
35 Gaithersburg, MD; and Boehringer Mannheim Corp. (BMC), 
Indianapolis, IN. 
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BMIfflb CPffffTRTTfTTON AMP tJSBSBBISM. °» raM M&X BttHgp, FT7 

The gene ncoding gp37 c ntains tw sites f r the 
restriction enzyme Bgl II, the first cleavage occurring after 
5 nucleotide 293 and the second after nucleotide i486 (the 
nucleotides are numbered from the initiator methionine codon 
ATG.) Thus, digestion of a DNA fragment encoding gp37 with 
Bglll, excision of the intervening fragment (nucleotides 
294- 1485) and re-ligation of the 5' and 3' fragments results 
10 in the formation of an internally deleted gp37, designated 
AP37, in which arginine-98 is joined with serine-497. 

The restriction digestion reaction mix contains: 
gp37 plasmid DNA (l /ig/A*l) 2sl 
15 NEB buffer #2 (10X) lfll 

*>° 6 M 1 
Bgl II (io U//il) lfll 

The gp37 plasmid signifies a pT7-5 plasmid into which gene 37 
20 has been inserted in the multiple cloning site, downstream of 
a good ribosome binding site and of gene 57 to chaperon the 
dimerization. The reaction is incubated for lh at 37«c 
Then, 89 M l of T4 DNA ligase buffer and 1 M l of T4 DNA iigase 
are added, and the reaction is continued at I6»c for 4 hours 
25 2 Ml of the stu I restriction enzyme are then added, and 

incubation continued at 37 «c for lh. (The Stu I restriction 
enzyme digests residual plasmids that were not cut by Bgl II 
in the first step, reducing their transformability by about 
100-fold.) y * 

30 The "action mixture is then transformed into E. 

coli strain BL21, obtained from Novagen, using standard 
procedures. The transformation mixture is plated onto 
nutrient agar containing loo M g/nl ampicillin, and the plates 
are incubated overnight at 37»c. 

35 Colonies that appear after vemight incubati n are 

Picked, and plasmid DNA is extracted and digest d with Bgl II 
as above. The restriction digests are resolved on 1% agarose 
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gels., A successful deletion Is evidenced by the appearanc 
after gel electroph r sis f a new DNA fragment f 4.2 kbp, 
repr senting th und leted part f gen 37 which is still 
attached to the plasmid and which re~f rm d a Bglll site by 
S ligation. The 1.2 kbp DNA fragment bounded by Bglll sites in 
the original gene is no longer in the plasmid and so is 
missing from the gel. 

Plasmids selected for the predicted deletion as 
above are transformed into B. coll strain BL21(DE3). 

10 Trans f orxnan ts are grown at 30 °C until the density (A^) of the 
culture reaches 0.6. IPTG is then added to a final 
concentration of 0.4 mM and incubation is continued at 30°C 
for 2h, after which the cultures are chilled on ice. 20 pi 
of the culture is then removed and added to 20 j*l of a 

IS two-fold concentrated "cracking buffer 11 containing 1% sodium 
dodecyl sulfate, glycerol, and tracking dye. 15 pi of this 
solution are loaded onto a 10% polyacrylamide gel; a second 
aliquot of 15 pi is first incubated in a boiling water bath 
for 3 min and then loaded on the same gel. After 

20 electrophoresis, the gel is fixed and stained. Expression of 
the deleted gp37 is evidenced by the appearance of a protein 
species migrating at an apparent molecular mass of 65*70,000 
daltons in the boiled sample. The extent of dimerization is 
suggested by the intensity of higher-molecular mass species 

25 in the unboiled sample and/or by the disappearance of the 
65-70,000 dalton protein band. 

The ability of the deleted polypeptide to dimerize 
appropriately is directly evaluated by testing its ability to 
be recognized by an anti-P37 antiserum that reacts only with 

30 mature P37 dimers, using a standard protein immunoblotting 
procedure. 

An alternative assay for functional dimerization f 
the deleted P37 polypeptide (also referred to as AP37) is its 
ability to complement in vivo a T4 37" phage, by first 

35 

inducing xpression of the AP37 and then infecting with th 
T4 mutant, and detecting pr geny phage. 
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15 



20 



25 



30 



35 



A AP37 was prepared as described above, and f und 
capable f c mplem nting a T4 37" phage in vivo. 



PBBTfflf. CONSTRUCTION AMD KXPRBflflTOM QJ A ere37-36 GHIKgR 

The starting plasnid for this construction is one 
in which the gene encoding gp37 is cloned immediately 
upstream (i.e., 5') of the gene encoding gp36. The plasmid 
is digested with Hae III, which deletes the entire 3' region 
of gp37 DNA downstream of nucleotide 724 to the 3' terminus, 
and also removes the 5' end of gp36 DNA from the 5' terminus 
to nucleotide 349. The reaction mixture is identical to that 
described in Example 1, except that a different plasmid DNA 
is used, and the enzyme is Haelii. Ligation using T4 DNA 
ligase, bacterial transformation, and restriction analysis 
are also performed as in Example l. in this case, excision 
of the central portion of the gene 37-36 insert and 
religation reveals a novel insert of 346 in-frame codons, 
which is cut only once by Haelii (after nucleotide 725) . The 
resulting construct is then expressed in B. coll BL21(DE3) as 
described in Example 1. 

Successful expression of the gp37-36 chimer is 
evidenced by the appearance of a protein product of about 
35,000 daltons. This protein will have the first 242 
N-terminal amino acids of gp37 fused to the final 104 
c-terminal amino acids of gp36 (numbered 118-221.) The 
utility of this chimer depends upon its ability to dimerize 

* 

and attach end-to-end. That is, carboxy termini of said 
polypeptide will have the capability of interacting with th 
amino terminus of the P37 protein dimer of bacteriophage T4 
and to form an attached dimer, and the amino terminus of the 
dimer of said polypeptide will have the capability of 
interacting with other said chimer polypeptides. This 
pr p rty can be t st d by assaying whether introduction f 
AP37 initiates dim rization and polymerization. 
Alternatively, polyclonal antibodies specific to P36 dimer 
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may be used t detect P36 subs qu nt to initiation of 
dimerization by AP37. 

A gp37-36 chim r was pr pared similarly t the 
procedures described above, except that the restriction 
5 enzyme TagI was used instead of Haelll, Briefly, the 5' 

fragment resulting from TagI digestion of gene 37 vas ligated 
to the 3' fragment resulting from TaqI digestion of gene 36. 
This produced a construct encoding a gp37-36 chimer in which 
amino acids 1-48 of gp37 were fused to amino acids 100-221 of 
10 gp36. This construct was expressed in B. coli BL21(DE3), and 
the chimer was detected as an 18 kD protein. This gp37-36 
chimer was found to inhibit the growth of wild type T4 when 
expression of the gp37-36 chimer was induced prior to 
infection (in an in vitro phage inhibition assay). 

15 

HEMffiU ? 

MUTATION OF THE GF37-36 CHIMER 
TO FBQPTCB CQHFLgMgyTARY gffPPRBffgQRP 

The goal of this construction is to produce two 

20 variants of a dimerizable P37-36 chimer: One in which the N- 
terminus of the polypeptide is mutated (A, designated 
*P37-36) and one in which the C- terminus of the polypeptide 
is mutated (B, designated P37-36*) . The requirement is that 
the mutated *P37 N- terminus cannot form a joint with the 

25 wild-type P36 C- terminus, but only with the mutated *P36 
N -terminus. The rationale is that A and B each cannot 
polymerize independently (as the parent P37-36 protein can) , 
but can only associate with each other sequentially (i.e., 
P37-36* + *P37-36 — > F37-36* — *P37-36) . 

30 A second construct, *p37-P36*, is formed by 

recombining +P37-36 and P37-36* in vitro. When the monomers 
*gp37-36* and gp37-36 are mixed in the presence of P37 
initiator, gp37-36 would dimerize and polymerize to 
(P37-36)n; similarly, *P37 would only catalyze the 

35 polymerization of *gp37-36* to (*P37-36*)n. In this cas , 
the two chim rs could be f different size and differ nt 
primary s quenc with different p tential sid -group 
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interactions, and could initiate attachment at different 
surfaces dep nding on the attachment specificity f P37. 

The starting bacterial strain is a su° strain of E. 
coll (which lacks the ability to suppress amber mutations) . 
5 When this strain is infected with a mutant T4 bacteriophage 
containing amber mutations in genes 35, 36, and 37, phage 
replication is incomplete, since the tail fiber proteins 
cannot be synthesized. When this strain is first transformed 
with a plasmid that directs the expression of the wild type 

10 gp35, gp36and gp37 genes and induced with IPTG, and 

subsequently infected with mutant phage, infectious phage 
particles are produced; this is evidenced by the appearance 
of "nibbled 91 colonies. Nibbled colonies do not appear round, 
with smooth edges, but rather have sectors missing. This is 

15 caused by attack of a microcolony by a single phage, which 
replicates and prevents the growth of the bacteria in the 
missing sector. 

For the purposes of this construction, the 
3 '-terminal region of gene 36 (corresponding to the 

20 C-terminal region of gp36) is mutagen! zed with randomly doped 
oligonucleotides. Randomly doped oligonucleotides are 
prepared during chemical synthesis of oligonucleotides, by 
adding a trace amount (up to a few percent) of the other 
three nucleotides at a given position, so that the resulting 

25 oligonucleotide mix has a small percentage of incorrect 
nucleotides at that position. Incorporation of such 
oligonucleotides into the plasmid will result in random 
mutations (Hutchison et al., Methods. Enzymol. 202:356, 1991)* 
The mutagenized population of plasmids (containing, 

30 however, unmodified genes 36 and 37) , is then transformed 
into the su° bacteria, followed by infection with the mutant 
T4 phage as above. In this case, the appearance of 
non- "nibbled " colonies indicates that the mutated gp36 
C-termini can no longer interact with wild type P37 to form 

35 functional tail fib rs. The putative gp36 phenotyp s found 
in such non-nibbled colonies ar checked for lack of dimeric 
N-termini by appropriate immunospecif icity as outlined above, 
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and positive col ni s are used as source of plasmid f r the 
next step. 

Sev ral f these mutated plasmids are recov red and 
subj cted to a s c nd r und f mutagen sis, this tim using 
5 doped oligonucleotides that introduce random mutations into 
the N- terminal region of gp37 present on the same plasmid. 
Again, the (now doubly) mutagenized plasmids are transformed 
into the supo strain of S. coll and transf ormants are 
infected with the mutant T4 phage. At this stage, bacterial 

10 plates are screened for the re-appearance of "nibbled" 

colonies. A nibbled colony at this stage indicates that the 
phage has replicated by virtue of suppression of the 
non-functional gp36* mutation (s) by the *P37 mutation. In 
other words, such colonies must contain novel *P37 

15 polypeptides that have now acquired the ability to interact 
with the P36* proteins encoded on the same plasmid. 

The *P37-36 and P37-36* paired suppressor chimers 
(A and B as above) are then constructed in the same manner as 
described in Example 2. In this case, however, *P37 is us d 

20 in place of wild type P37 and P36* is used in place of wild 
type P36. A *P37~36* chimer can now be made by restriction 
of *P37-36 and P37-36* and religation in the recombined 
order. The *P37-36* can be mixed with the P37-36 chimer, and 
the polymerization of each can be accomplished independently 

25 in the presence of the other. This is useful when the 

rod-like central portion of these chimers have been modified 
in different ways. 

EBMEEM i 

so pgsigHx ssmxsssns& ran zxeeesbish be a sbistm asm 

The starting plasmid for this construction is one 
in which the vector containing gene 57 and the gene encoding 
gp36 is cloned immediately upstream (i.e., 5 9 ) of the gene 
encoding gp34. The plasmid is digested with Ndel, which cuts 
35 after bp 219 of gene 36 and after bp 2594 of gene 34, thereby 
deleting the final 148 C-terminal codons fr m the pg36 m iety 
and the first 865 N- terminal codons from the gp34 m i ty. 
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The reacti n mixture is id ntical to that described in 
Example i, except that a different plasmid ONA is used, and 
the enzyme us d is Kdel (NEB, . Ligation using T4 DNA ligase, 
bacterial transformation, and restriction analysis are also 
S performed as in Example l. This results in a new hybrid gene 
encoding a protein of 497 amino acids (73 N-terminal amino 
acids of gp36 and 424 C-terminal amino acids of gp34 
numbered 866-1289.) ' 

« n _ . _ „ w ** W alte ^ive, the starting plasmid is cut with 

Se/? P in ^ 34 ' and «» ^o-Size Deletion Kit 
(NEB) is used to create deletions as described above. 

The resulting construct is then expressed in 
S. coll BL2l(DE3) as described in Example l. Successful 
expression of the gp36-34 chimer is evidenced by the 
15 appearance of a protein product of about 55,000 daltons. 
Preferably, the amino termini of the polypeptide homodimer 
have the capability of interacting with the gp35 protein, and 
then the carboxy termini have the capability of interacting 

>. ^ „? attaChed 9535 B ° leCUle8 - Successful formation of 
20 the dimer can be detected by reaction with anti-P36 

TZ^ S ^ ^ attaChDent ° f *> 35 or by the in vitro phage 
inhibition assay described in Example 2. 



ISOLATION Of TBHtffPMBTT.ff nor*™* m B Bfcgs&gfiEBBLX 

Thermolabile structures can be utilized in 
nanostructures for: a, initiation of chimer polymerization 
(e.g., gp37-36) at low temperature and subsequent 
inactivation of and separation from the initiator at high 

30 temperature; b, initiation of angle formation between P36 and 
9P35 (e.g., variants of gp 3 5 that have thermolabile 
attachment sites for P36 N-termini or P34 C-termini, a 
variant P36 that forms a thermolabile attachment to gp 35 , and 
a varxant P34 with a thermolabile C-ter»inal attachment 

35 ait .) Thermolability may be reversible, permitting 
reattachment of the appr priate termini when the lower 
temperature is rest r d, or it may be irreversible. 
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To cr ate a variant gp37 that p rmits heat induced 
s parati n of th P36 — P37 junction, th 5 1 end f gp37 DNA 
is randomly mutagenized using doped oligonucleotid s as 
described above. Th mutag nized DNA fragment is th n 
5 recombined into T4 phage by infection of the cell containing 
the mutagenized DNA by a T4 phage containing two amber 
mutations flanking the mutagenized region. Following a low- 
multiplicity infection, non-amber phage are selected at low 
temperature on E. coll Su° at 30 °C. The progeny of these 
10 plaques are resuspended in buffered and challenged by heating 
at 60°C. At this temperature, wild-type tail fibers remain 
intact and functional, whereas the thermolabile versions 
release the terminal P37 units and thus render those phage 
non-infectious. 

IS At this stage, wild type phage are removed by: 1) 

adsorbing the wild type phage to sensitive bacteria and 
sediment ing (or filtering out) the bacteria with the adsorbed 
wild type phage; or 2) reacting the lysate with anti-P37 
antibody, followed by immobilized Protein A and removal of 

20 adsorbed wild type phage. Either method leaves the 

noninfectious mutant phage particles in the supernatant fluid 
or filtrate, from which they can be recovered. The 
non-infectious phage lacking terminal P37 moieties (and 
probably the rest of the tail fibers as well) are then urea 

25 treated with 6M urea, and mixed with bacterial spheroplasts 
to permit infection at low multiplicity whereupon they 
replicate at low temperature and release progeny. 
Alternatively, infectious phage are reconstituted by in vitro 
incubation of the mutant phage with wild type P37 at 30°C; 

30 this is followed by infection of intact bacterial cells using 
the standard protocol. The latter method of infection 
specifically selects mutant phage in which the thermolability 
of the P36-P37 junction is reversible. 

Using either method, the phage populations are 

35 subjected to multiple rounds of selection as above, after 
which individual phage particl s are isolated by plaque 
purificati n at 30 °C. Finally, the putative mutants ar 
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evaluated individually for the foil wing characteristics: 
1) loss of infect! vity after incubation at high temperatures 
(40-60°C), as measur d by a deer ase in titer; 2) loss of P37 
after incubation at high temperature, as measured by decreas 
S in binding of P37~apecific antibody to phage particles; and 
3) morphological changes in the tail fibers after incubation 
at high temperatures, as assessed by electron microscopy. 

After mutants are isolated and their phenotypes 
confirmed, the P37 gene is sequenced* If the mutations 

10 localize to particular regions or residues, those sequences 
are targeted for site-directed mutagenesis to optimize the 
desired characteristics. 

Finally, the mutant gene 37 is cloned into 
expression plasmids and expressed individually in E. coll as 

15 in Example 1. The mutant P37 dimers are then purified from 
bacterial extracts and used in in vitro assembly reactions. 

In a similar fashion, mutant gp35 polypeptides can 
be isolated that exhibit a thermolabile interaction with th 
N- terminus of P36 or the C-terminus of P34. For thermolabile 

20 interaction with P34, phage are incubated at high 

temperature, resulting in the loss of the entire distal half 
of the tail fiber (i.e., gp35-P36-P37) . The only difference 
in the experimental protocol is that, in this case, 1) random 
mutagenesis is performed over the entire gp35 gene; 2) wild- 

25 type phage (and distal half-fibers from thermolabile mutants) 
are separated from thermolabile mutant phage that have been 
inactivated at high temperature (but still have proximal half 
tail fibers attached) by precipitating both the distal half- 
fibers and the phage particles containing intact tail fibers 

30 with any of the anti-distal half tail-fiber antibodies 
followed by Staphylococcal A-protein beads; 3) the mutant 
phage remaining in the supernatant are reactivated by 
incubation at low temperature with bacterial extracts 
containing wild type intact distal half fibers; and 4) stocks 

35 of therm labile gene, 35 mutants grown at 30°C can be tested 
for reversibl thermo lability by inactivation at 60°C and 
r incubation at 30 °C. Inactivation is p r formed on a 
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c ncentrated suspension of phage, and reincubation at 30 °C is 
performed ither before or after dilution. If phage are 
successfully reactivated b fore, but not after, dilution, 
this indicates that their gp35 is reversibly thermolabile. 
S To create a gene 36 mutation with a thermolabile 

gp35 — P36 linkage, the C-terminus of gene 36 is mutagenized 
as described above, and the mutant selected for 
reversibility. An alternative is to mutagenize gp35 to 
create a gene 35 mutant in which the gp35-P36 linkage will 
10 dissociate at 60 °C. In this case, incubation with anti-gp35 
antibodies can be used to precipitate the phage without 
P36-P37 and thus to separate them from the wild-type phage y 
and distal half-tail fibers (P36-P37) , since the variant gp35 
will remain attached to P34. 

15 

BXAMPIiB 6 
ASSEMBLY OP ONE— DIMEHSIOHAL RODS 
A. Simple Assembly: The P37-36 chimer described in 
Example 2 is capable of self-assembly, but requires a P37 

20 initiator to bind the first unit of the rod. Therefore, a 
P37 or a AP37 dimer is either attached to a solid matrix or 
is free in solution to serve as an initiator. If the 
initiator is, attached to a solid matrix, a thermolabile P37 
dimer is preferably used. Addition of an extract containing 

25 gp37-36, or the purified gp37-36 chimer, results in the 
assembly of linear multimers of increasing length. In the 
matrix-bound case, the final rods are released by a brief 
incubation at high temperature (40-60°C, depending on the 
characteristics of the particular thermolabile P37 variant.) 

30 The ratio of initiator to gp37-36 can be varied, 

and the size distribution of the rods is measured by any of 
the following methods: 1) Size exclusion chromatography; 
2) Increase in the viscosity of the solution; and 3) Direct 
measurement by electron microscopy. 

35 B. Staged assembly: The P37-36 variants *P37-36 

and P37-36* described in Example 3 cannot self -polymerize. 
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This allows th staged assembly f rods of defined length, 
according to the following pr t c 1: 

1. Attach initiat r P37 (preferably 
thermolabile) to a matrix. 

5 2. Add excess *gp37-36 to attach and oligomerize 

as P37-36 homooligomers to the N-terminus of P37. 

3. Wash out unreacted *gp37-36 and flood with 

gp37-36*. 

4. wash out unreacted gp37-36* and flood with 
10 excess *gp37-36. 

5. Repeat steps 2-4, n-1 times. 

6. Release assembly from matrix by brief 
incubation at high temperature as above. 

The linear dimensions of the protein rods in the 
15 batch will depend upon the lengths of the unit heterochimers 
and the number of cycles (n) of addition. This method has 
the advantage of insuring absolute reproducibility of rod 
length and a homogenous, monodisperse size distribution from 
one preparation to another. 



20 




STAGED IflHl 

The following assembly strategy utilizes gp35 as an 
angle joint to allow the formation of polygons. For the 

25 purpose of this example, the angle formed by gp35 is assumed 
to be 137* . The rod unit comprises the P36-34 chimer 
described in Example 4, which is incapable of 
self-polymerization. The P36-34 homodimer is made from a 
bacterial clone in which both gp36-34 and gp57 are expressed. 

30 The gp57 can chaperone the homodimerization of gp36-34 to 
P36-34. 

1. Initiator: The incomplete distal half fiber 
P36-37 is attached to a solid matrix by the P37 C-terminus. 
Thermolabile gp35 as describ d in Example 5 is then added to 
35 form the intact initiator. 
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2. Excess P36-34 chimer is added to attach a 
single P36-34. Following binding to the matrix via gp35, the 
unbound chimer is washed ut. 

3. Wild-typ (i. ., n n-therm labile) gp35 is then 
5 added in excess. After incubation, the unbound material is 

washed out. 

4. Steps 2 and 3 are repeated 7-8 times. 

5. The assembly is released from the matrix by 
brief incubation at high temperature. 

10 The released polymeric rod, 8 units long, will 

form a regular 8-sided polygon, whose sides comprise the 
P36-34 dimer and whose joints comprise the wild-type gp35 
monomer. However, there will be some multimers of these 8 
units bound as helices. When a unit does not close, but 

15 instead adds another to its terminus, the unit cannot close 
further and the helix can build in either direction. The 
direction of the first overlap also determines the handedness 
of the helix. Ten (or seven) -unit rods may form helices mor 
frequently than polygons since their natural angles are 144° 

20 (or 128.6°). The likelihood of closure of a regular polygon 
depends not only on the average angle of gp35 but also on its 
flexibility, which can be further manipulated by genetic or 
environmental modification. 

The type of polygon that is formed using this 

25 protocol depends upon the length of rod units and the angle 
formed by the angle joint. For example, alternating rod 
units of different sizes can be used in step 2. In addition, 
variant gp35 polypeptides that form angles different than the 
natural angle of 137 • can be used, allowing the formation of 

30 different regular polygons. Furthermore, for a given polygon 
with an even number of sides and equal angles, the sides in 
either half can be of any size provided the two halves are 
symmetric. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(1) APPLICANT: Goldberg, Edward B. 

(ii) TITLE OF INVENTION: MATERIALS FOR THE PRODUCTION OF 
NANOMETER STRUCTURES AND USE THEREOF 

(111) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pennie and Edmonds 

<B) STREET: 1155 Avenue of the Americas 

(C) CITY: New York 

(D) STATE: New York 

(E) COUNTRY: US 

(F) ZIP: 10036 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To be Assigned 

(B) FILING DATE: 13-OCT-1995 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Hisrock, S. Leslie 

(B) REGISTRATION NUMBER: 18,872 

(C) REFERENCE / DOCKET NUMBER: 8471-0005-999 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (212) 790-9090 

(B) TELEFAX: 212-869-8864 

(C) TELEX: 66441 PENNIE 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8855 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND ED NESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: TAIL FIBER GENES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
TAGGAGCCCG GGAGAATGGC CGAGATTAAA AGAGAATTCA GAGCAGAAGA TGGTCTGGAC 60 
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GGAGGTGGTG 


ATAAAATAAT 


CAACGTAGCT 


TTAGCTGATC 


GTACCGTAGG 


AACTGACGGT 


120 


GTTAACGTTG 


ATTACTTAAT 


TCAAGAAAAC 


ACAGTTCAAC 


ACTATGATCC 


AACTCGTGGA 


180 


TATTTAAAAG 


ATTTTGTAAT 


CATTTATGAT 


AACCGCTTTT 


GGGCTGCTAT 


AAATGATATT 


240 


CCAAAACCAG 


CAGGAGCTTT 


TAATAGCGGA 


CGCTGGAGAG 


CATTACGTAC 


CGATCCTAAC 


300 


TGGATTACGG 


TTTCATCTGG 


TTCATATCAA 


TTAAAATCTG 


GTGAAGCAAT 


TTCGGTTAAC 


360 


ACCGCAGCTG 


GAAATGACAT 


CACGTTTACT 


TTACCATCTT 


CTCCAATTGA 


TGGTGATACT 


420 


ATCGTTCTCC 


AAGATATTGG 


AGGAAAACCT 


GGAGTTAACC 


AAGTTTTAAT 


TGTAGCTCCA 


480 


GTACAAAGTA 


TTGTAAACTT 


TAGAGGTGAA 


CAGGTACGTT 


CAGTACTAAT 


GACTCATCCA 


540 


AAGTCACAGC 


TAG TTTTAAT 


TTTTAGTAAT 


CGTCTGTGGC 


AAATGTATGT 


TGCTGATTAT 


600 


AGTAGAGAAG 


CTATAGTTGT 


AACACCAGCG 


AATACTTATC 


AAGCGCAATC 


CAACGATTTT 


660 


ATCGTACGTA 


GATTTACTTC 


TGCTGCACCA 


ATTAATGTCA 


AACTTCCAAG 


ATTTGCTAAT 


720 


CATGGCGATA 


TTATTAATTT 


CGTCGATTTA 


GATAAACTAA 


ATCCGCTTTA 


TCATACAATT 


780 


GTTACTACAT 


ACGATGAAAC 


GACTTCAGTA 


CAAGAAGTTG 


GAACTCATTC 


CATTGAAGGC 


840 


CGTACATCGA 


TTGACGGTTT 


CTTGATGTTT 


GATGATAATG 


AGAAATTATG 


GAGACTGTTT 


900 


GACGGGGATA 


GTAAAGCGCG 


TTTACGTATC 


ATAACGACTA 


ATTCAAACAT 


TCGTCCAAAT 


960 


GAAGAAGTTA 


TGGTATTTGG 


TGCGAATAAC 


GGAACAACTC 


AAACAATTGA 


GCTTAAGCTT 


1020 


CCAACTAATA 


TTTCTGTTGG 


TGATACTGTT 


AAAATTTCCA 


TGAATTACAT 


GAGAAAAGGA 


1080 


CAAAGAGTTA 


AAATCAAAGC 


TGCTGATGAA 


GATAAAATTG 


CTTCTTCAGT 


TCAATTGCTG 


1140 


CAATTCCCAA 


AACGCTCAGA 


ATATCCACCT 


GAAGCTGAAT 


GGGTTACAGT 


TCAAGAATTA 


1200 


CTTTTTAACG 


ATGAAACTAA 


TTATGTTCCA 


GTTTTGGAGC 


TTGCTTACAT 


AGAAGATTCT 


1260 


GATGGAAAAT 


ATTGGGTTGT 


ACAGCAAAAC 


GTTCCAACTG 


TAGAAAGAGT 


AGATTCTTTA 


1320 


AATGATTCTA 


CTAGAGCAAG 


ATTAGGCGTA 


ATTGCTTTAG 


CTACACAAGC 


TCAAGCTAAT 


1380 


GTCGATTTAG 


AAAATTCTCC 


ACAAAAAGAA 


TTAGCAATTA 


CTCCAGAAAC 


GTTAGCTAAT 


1440 


CGTACTGCTA 


CAGAAACTCG 


CAGAGGTATT 


GCAAGAATAG 


CAACTACTGC 


TCAAGTGAAT 


1500 


CAGAACACCA 


CATTCTCTTT 


TGCTGATGAT 


ATTATCATCA 


CTCCTAAAAA 


GCTGAATGAA 


1560 


AGAACTGCTA 


CAGAAACTCG 


TAGAGGTGTC 


GCAGAAATTG 


CTACGCAGCA 


AGAAACTAAT 


1620 


GCAGGAACCG 


ATGATACTAC 


AATCATCACT 


CCTAAAAAGC 


TTCAAGCTCG 


TCAAGGTTCT 


1680 


GAATCATTAT 


CTGGTATTGT 


AACCTTTGTA 


TCTACTGCAG 


GTGCTACTCC 


AGCTTCTAGC 


1740 


CGTGAATTAA 


ATGGTACGAA 


TGTTTATAAT 


AAAAACACTG 


ATAATTTAGT 


TG TTTCACCT 


1800 


AAAGCTTTGG 


ATCAGTATAA 


AGCTACTCCA 


ACACAGCAAG 


GTGCAGTAAT 


TTTAGCAGTT 


1860 


GAAAGTGAAG 


TAATTGCTGG 


ACAAAGTCAG 


CAAGGATGGG 


CAAATGCTGT 


TGTAACGCCA 


1920 


GAAACGTTAC 


ATAAAAAGAC 


ATCAACTGAT 


GGAAGAATTG 


GTTTAATTGA 


AATTGCTACG 


1980 


GAAAGTGAAG 


TTAATACAGG 


AACTGATTAT 


ACTCGTGCAG 


TCACTCCTAA 


AACTTTAAAT 


2040 


GACOGTAGAG 


CAACTGAAAG 


TTTAAGTGGT 


ATAGCTGAAA 


TTGCTACACA 


AGTTGAATTC 


2100 
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CACGCAGCCC TCCACGATAC TCGTATCTCT ACACCATTAA AAATTAAAAC 


: CAGATTTAAT 


2160 


AGTACTGATC CTACTTCTCT 


TGTTGCTCTA 


TCTGGATTAG 


TTGAATCAGG 


AACTCTCTGG 


2220 


OACCATTATA 


CACTTAATAT 


TCTTGAAGCA 


AATGAGACAC 


AACGTGGTAC 


ACTTCGTGTA 


2280 


GCTACGCAGG 


TOGAAGCTGC 


TGCGGGAACA 


TTAGATAATG 


TTTTAATAAC 


TCCTAAAAAG 


2340 


CTTTTAGGTA 


CTAAATCTAC 


TOAAGOGCAA 


GAGGGTGTTA 


TTAAAGTTCC 


AACTCAGTCT 


2400 


GAAACTGTGA 


CTGGAACGTC 


AGCAAATACT 


GCTGTATCTC 


GAAAAAATTT 


AAAATGGATT 


2460 


0GCA6AGTG 


AACCTACTTG 


GGCAGCTACT 


ACTGCAATAA 


GAGGTTTTGT 


TAAAACTTCA 


2520 


TCTC6TTCAA 


TTACATTCCT 


TCGTAATGAT 


ACAGTCGGTT 


CTACCCAAGA 


TTTAGAACTG 


2580 


TATCACAAAA ATAGCTATGC GGTATCACCA TATCAATTAA ACCGTGTATT ACCAAATTAT 


2640 


TTGCCACTAA 


AAGCAAAAGC 


TGCTGATACA 


AATTTATTGG 


ATGGTCTAGA 


TTCATCTCAG 


2700 


TTCATTCGTA 


GGGATATTGC 


ACAGACGGTT 


AATGGTTCAC 


TAACCTTAAC 


CCAACAAACG 


2760 


AATCTGA6TG 


CCCCTCTTGT 


ATCATCTAGT 


ACTGGTGAAT 


TTGGTGGTTC 


ATTGGCOGCT 


2820 


AATAGAACAT 


TTACCATCCG 


TAATACAGGA 


GCCCCGACTA 


GTATCGTTTT 


CGAAAAAGGT 


2880 


CCTGCATCCG 


GGGCAAATCC 


TGCACAGTCA 


ATGAGTATTC 


GTGTATGGGG 


TAACCAATTT 


2940 


GGCGGCGGTA 


GTGATACGAC 


CCGTTCGACA 


GTGTTTGAAG 


TTGGCGATGA 


CACATCTCAT 


3000 


CACTTTTATT 


CTCAACGTAA 


TAAAGACGGT 


AATATAGCGT 


TTAACATTAA 


TGGTACTGTA 


3060 


ATGCCAATAA 


AGATTAATGC 


TTCCGGTTTG 


ATGAATGTGA 


ATGGCACTGC 


AACATTCGGT 


3120 


OGTTCAGTTA 


CAGCCAATGG 


TGAATTCATC 


AGCAAGTCTG 


CAAATGCTTT 


TAGAGCAATA 


3180 


AACGCTGATT ACGGATTCTT 


TATTCGTAAT 


GATGCCTCTA 


ATACCTATTT 


TTTGCTCACT 


3240 


GCAGCCGGTG 


ATCAGACTGG 


TGGTTTTAAT 


GGATTACGCC 


CATTATTAAT 


TAATAATCAA 


3300 


TCCGGTCAGA 


TTACAATTGG 


TGAAGGCTTA 


ATCATTGCCA 


AAGGTGTTAC 


TATAAATTGA 


3360 


GGCGGTTTAA 


CTGTTAACTC 


GAGAATTCGT 


TCTCAGGGTA 


CTAAAACATC 


TGATTTATAT 


3420 


ACCCGTGCGC 


CAACATCTGA 


TACTGTAGGA 


TTCTGGTCAA 


TCGATATTAA 


TGATTCAGCC 


3480 


ACTTATAACC 


AGTTCCCGGG 


TTATTTTAAA 


ATGGTTGAAA 


AAACTAATGA 


AGTGACTGGG 


3540 


CTTCCATACT 


TAGAACGTGG 


CGAAGAAGTT 


AAATCTCCTG 


GTACACTGAC 


TCAGTTTGGT 


3600 


AACACACTTG 


ATTCGCTTTA 


CCAAGATTGG 


ATTACTTATC 


GAACGACGCC 


AGAAGCGCGT 


3660 


ACCACTCGCT 


GGACACGTAC 


ATGGCAGAAA 


ACCAAAAACT 


CTTGGTCAAG 


TTTTGTTCAG 


3720 


GTATTTGACG 


GAGGTAACCC 


TCCTCAACCA 


TCTGATATCG 


GTGCTTTACC 


ATCTGATAAT 


3780 


GCTACAATGG 


GGAA7CTTAC 


TATTCGTGAT 


TTCTTGCGAA 


TTGGTAATGT 


TCGCATTGTT 


3840 


CCTGACCCAG 


TGAATAAAAC 


GGTTAAATTT 


GAATGGGTTG 


AATAAGAGGT 


ATTATGGAAA 


3900 


AATTTATGGC 


CGAGATTTGG 


ACAAGGATAT 


GTCCAAACGC 


CATTTTATCG 


GAAAGTAATT 


3960 


CAGTAAGATA 


TAAAATAAGT 


ATAGCGGGTT 


CTTGCCCGCT 


TTCTACAGCA 


GGACCATCAT 


4020 


ATGTTAAATT 


TCAGGATAAT 


CCTGTAGGAA 


GTCAAACATT 


TAGGCGCAGG 


CCTTCATTTA 


4080 


AGAGTTTTTG 


ACCCTTCCAC 


CGGAGCATTA 


GTTGATAGTA 


AGTCATATGC 


TTTTTCGACT 


4140 
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TCAAATGATA 


CTACATCAGC 


TGCTTTTGTT 


AGTTTTCATG 


AATTCTTTGA 


CGAATAATCG 


4200 


AATTGTTGCT ATATTAACTA 


GTGGAAAGGT 


TAATTTTCCT 


CCTGAAGTAG 


TATCTTGGTT 


4260 


AAGAACCGCC 


GGAACGTCTG 


CCTTTCCATC 


TGATTCTATA 


TTGTCAAGAT 


TTGACGTATC 


4320 


ATATGCTGCT 


TTTTATACTT 


CTTCTAAAAG 


AGCTATCGCA 


TTAGAGCATG 


TTAAACTGAG 


4360 


TAATAGAAAA 


AGCAGAGATG 


ATTATCAAAC 


TATTTTAGAT 


GTTGTATTTG 


ACAGTTTAGA 


4440 


AGATGTAGGA 


GCTACCGGGT 


TTCCAAGAAG 


AACGTATGAA 


AGTGTTGAGC 


AATTCATGTC 


4500 


CGCACTTGGT 


GGAACTAATA 


ACGAAATTGC 


CAGATTGCCA 


ACTTCAGCTG 


CTATAAGTAA 


4560 


ATTATCTCAT 


TATAATTTAA 


TTCCTGGAGA 


TGTTCTTTAT 


CTTAAAGCTC 


AGTTATATGC 


4620 


TCATGCTCAT 


TTACTTGCTC 


TTGGAACTAC 


AAATATATCT 


ATCCGTTTTT 


ATAATGCATC 


4680 


TAACCCATAT 


ATTTCTTCAA 


CAGAAGCTGA 


ATTTACTGGG 


CAAGCTGGGT 


CATGGGAATT 


4740 


AAAGGAAGAT 


TATGTAGTTG 


TTCCAGAAAA 


CGCAGTAGGA 


TTTACGATAT 


ACGCACAGAG 


4800 


AACTGCACAA 


GCTGGCCAAG 


GTGGCATGAG 


AAATTTAAGC 


TTTTCTGAAG 


TATCAAGAAA 


> 4860 ^ 


TGGCGGCATT 


TCGAAACCTG 


CTGAATTTGG 


CGTCAATGGT 


ATTCGTGTTA 


ATTATATCTG 


4920 


CGAATCCGCT 


TCACCTCCGG 


ATATAATGGT 


ACTTCCTACG 


CAAGCATCGT 


CTAAAACTGG 


4980 


TAAAGTGTTT 


GGGCAAGAAT 


TTAGAGAAGT 


TTAAATTGAG 


GGACCCTTCG 


GGTTCCCTTT 


5040 




ATACTATTCA 


AATAAAGGGG 


CATACAATGG 


CTGATTTAAA 


AGTAGGTTCA 


5100 


ACAACTGGAG 


GCTCTGTCAT 


TTGG CAT CAA 


GGAAATTTTC 


CATTGAATCC 


AGCCGGTGAC 


5160 


GATGTACTCT 


ATAAATCATT 


TAAAATATAT 


TCAGAATATA 


ACAAACCACA 


AGCTGCTGAT 


5220 


AACGATTTCG 


TTTCTAAAGC 


TAATGGTGGT 


ACTTATGCAT 


GAAAGGTAAC 


ATTTAACGCT 


5280 


GGCATTCAAG 


TCCCATATGC 


TCCAAACATC 


ATGAGCCCAT 


GCGGGATTTA 


TGGGGGTAAC 


5340 


GGTGATGGTG 


CTACTTTTGA 


TAAAGCAAAT 


ATCGATATTG 


TTTCATGGTA 


TGGCGTAGGA 


5400 


TTTAAATCGT 


CATTTGGTTC 


AACAGGCCGA 


ACTGTTGTAA 


TTAATACACG 


CAATGGTGAT 


5460 


ATTAACACAA 


AAGGTGTTGT 


GTCGGCAGCT 


GGTGAAGTAA 


GAAGTGGTGC 


GGCTGCTCCT 


5520 


ATAGCAGCGA 


ATGACCTTAC 


TAGAAAGGAC 


TATGTTGATG 


GAG CAA T AAA 


T ACTG TT ACT 


5580 


GGAAATGCAA 


ACTCTAGGG T 


G CT ACGGTCT 


GGTGACACCA 


TGACAGGTAA 


TTTAACAGCG 


5640 


CCAAACTTTT 


TCTCGCAGAA 


TCCTGCATCT 


CAACCCTCAC 


ACGTTCCACG 


ATTTGACCAA 


5700 


AT CGT AATT A 


AGGATTCTGT 


TCAAGATTTC 


GGCTATTATT 


AAGAGGACTT 


ATGGCTACTT 


5760 


TAAAACAAAT 


ACAATTTAAA 


AGAAGCAAAA 


TCGCAGGAAC 


ACGTCCTGCT 


G CTTCAGTAT 


5820 


TAGCCGAAGG 


TGAATTGGCT 


ATAAACTTAA 


AAGATAGAAC 


AATTTTTACT 


AAAGATGATT 


5880 


CAGGAAATAT 


CATCGATCTA 


GGTTTTGCTA 


AAGGCGGGCA 


ACTTGATGGC 


AACGTTACTA 


5940 


TTAACGGACT 


TTTGAGATTA 


AATGGCGATT 


ATGTACAAAC 


AGGTGGAATG 


ACTGTAAACG 


6000 


GACCCATTGG 


TTCTACTGAT 


GGCGTCACTG 


GAAAAATTTT 


CAGATCTACA 


CAGGGTTCAT 


6060 


TTTATGCAAG 


AGCAACAAAC 


GATACTTCAA 


ATG CCCATTT 


ATGGTTTGAA 


AATGCCGATG 


6120 


GCACTGAACG 


TGGCGTTATA 


TATGCTCGCC 


CTCAAACTAC 


AACTGACGGT 


GAAATACGCC 


6180 
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TTAGGGTTAG ACAAGGAACA 


GGAAGCACTG 


CCAACAGTGA 


ATTCiATTTC OGCTCTATAA 


6240 


ATGGAGGCGA ATTTCAGGCT 


AACCGTATTT 


TAG CATC AG A 


TTCGTTAGTA 


ACAAAACGCA 


6300 


TTOCGGTTGA TACCGTTATT 


CATGATGCCA 


AAGCATTTGG 


ACAATATGAT 


TCTCACTCTT 


6360 


TGGTTAATTA TGTTTATCCT 


GGAACCGGTG 


AAACAAATGG 


TGTAAACTAT 


CTTCGTAAAG 


6420 


TTCGCGCTAA GTCCGGTGGT 


ACAATTTATC 


ATGAAATTGT 


TACTGCACAA 


ACAGGCCTGG 


6480 


CTGATGAACT 


TTCTTGGTGG 


TCTGGTGATA 


CACCAGTATT 


TAAACTATAC 


GGTATTCGTG 


6540 


ACGATGGCAG 


AATGATTATC 


CGTAATAGCC 


TTGCATTAGG 


TACATTCACT 


ACAAATTTCC 


6600 


CGTCTAGTGA 


TTATGGCAAC 


GTCGGTGTAA 


TGGGCCATAA 


G TATCTTG TT 


CTCGGCGACA 


6660 


CTGTAACTGG 


CTTGTCATAC 


AAAAAAACTG 


GTGTATTTGA 


TCTAGTTGGC GGTGGATATT 


6720 


CTGTTGCTTC TATTACTCCT 


GACAGTTTCC 


GTAGTACTCG 


TAAAGGTATA 


TTTGGTCGTT 


6780 


CTGAGGACCA 


AGGCGCAACT 


TGGATAATGC 


CTGGTACAAA 


TGCTGCTCTC 


TTGTCTGTTC 


6840 


AAACACAAGC 


TGATAATAAC 


AATGCTGGAG 


ACGG ACAAAC 


CCATATCGGG 


TACAATGCTG 


6900 


GCGGTAAAAT 


GAACCACTAT 


TTCCGTGGTA 


CAGGTCAGAT 


GAATATCAAT 


ACCCAACAAG 


6960 


GTATGGAAAT 


TAACCCGGGT 


ATTTTGAAAT 


TGGTAACTGG 


CTCTAATAAT 


GTACAATTTT 


7020 


ACGCTGACGG 


AACTATTTCT 


TCCATTCAAC 


CTATTAAATT 


AGATAACGAG 


ATATTTTTAA 


7080 


CTAAATCTAA 


TAATACTGCG 


GGTCTTAAAT 


TTGGAGCTCC 


TAGCCAAGTT 


GATGGCACAA 


7140 


GGACTATCCA 


ATGGAACGGT 


GGTACTCGCG 


AAGGACAGAA 


TAAAAACTAT 


GTGATTATTA 


7200 


AAGCATGGGG 


TAACTCATTT 


AATGCCACTG 


GTGATAGATC 


TCGCGAAACG 


GTTTTCCAAG 


7260 


TATCAGATAG 


TCAAGGATAT 


TATTTTTATG 


CTCATCGTAA 


AGCTCCAACC 


GGCGACGAAA 


7320 


CTATTGGACG 


TATTGAAGCT 


CAATTTGCTG 


GGGATGTTTA 


TGCTAAAGGT 


ATTATTGCCA 


7380 


ACGGAAATTT 


TAGAGTTGTT 


GGGTCAAGCG 


CTTTAGCCGG 


CAATGTTACT 


ATGTCTAACG 


7440 




CCAAGGTGGT 


TCTTCTATTA 


CTGGACAAGT 


TAAAATTGGC 


CGAACAuCAA 


7500 


ACGCACTGAG 


AATTTGGAAC 


GCTGAATATG 


CTGCTATTTT 


CCGTCGTTCG 


GAAAGTAACT 


7560 


TTTATATTAT 


TCCAACCAAT 


CAAAATGAAG 


GAGAAAGTGG 


AGACATTCAC 


AGCTCTTTGA 


7620 


GACCTGTGAG 


AATAGGATTA 


AACGATGGCA 


TGGTTGGGTT 


AGGAAGAGAT 


TCTTTTATAG 


7680 


TAGATCAAAA 


TAATGCTTTA 


ACTACGATAA 


ACAGTAACTC 


TCGCATTAAT 


GCCAACTTTA 


7740 


GAATGCAATT 


GGGCCAGTCG 


GCATACATTG 


ATGCAGAATG 


TACTGATGCT 


GTTCGCCCGG 


7800 


OGGGTGCAGG 


TTCATTTGCT 


TCCCAGAATA 


ATGAAGACGT 


CCGTGCGCCG 


TTCTATATGA 


7860 


ATATTGATAG 


AACTGATGCT 


AGTGCATATG 


TTCCTATTTT 


GAAACAACGT 


TATGTTCAAC 


7920 


GCAATGGCTG 


CTATTCATTA 


GGGACTTTAA 


TTAATAATGG 


TAATTTCCGA 


GTTCATTACC 


7980 


ATGGCGGCGG 


AGATAACGGT 


TCTACAGGTC 


CACAGACTGC 


TGATTTTGGA 


TGGGAATTTA 


8040 


TTAAAAACGG 


TG ATT TT ATT 


TCACCTCGCG 


ATTTAATAGC 


AGGGAAAGTC 


AGATTTGATA 


8100 


GAACTGGTAA 


TATCACTCGT 


GGTTCTGGTA 


ATTTTGCTAA 


CTTAAACAGT 


ACAATTGAAT 


8160 


CACTTAAAAC 


TGATATCATG 


TCGAGTTACC 


CAATTGGTGC 


TCCGATTCCT 


TGGCCGAGT 


8220 
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ATTCAGTTCC TGCTGCATTT GCTTTGATGG AAGGTCAGAC CTTTGATAAG TCCGCATATC 8280 

CAAAGTTAGC TGTTGCATAT CCTAGCGGTG TTATTCCAGA TATGCGCGGG CAAACTATCA 8340 

AGCGTAAACC AAGTGGTCGT GCTGTTTTGA GCGCTGAGGC AGATGGTGTT AAGGCTCATA 8400 

GCCATAGTGC ATCGGCTTCA AGTACTGACT TAGGTACTAA AACCACATCA AGCTTTGACT 8460 

ATGGTACGAA GGG AACTAAC AGTAOGGGTG GACAGACTCA CTCTGGTAGT GGTTCTACTA 8520 

GCACAAATGG TGAGCACAGC CACTACATCG AGGCATGGAA TGGTACTGGT GTAGGTGGTA 8580 

ATAAGATGTC ATCATATCCC ATATCATACA GGGCGGGTGG GAGTAACACT AATGCAGCAG 8640 

GGAACCACAG TCACACTTTC TCTTTTGGGA CTAGCAGTGC TGGCGACCAT TCCCACTCTG 8700 

TAGGTATTGG TGCTCATACC CACACGGTAG CAATTGGATC ACATGGTCAT ACTATCACTG 8760 

TAAATAGTAC AGGTAATACA GAAAACACGG TTAAAAACAT TGCTTTTAAC TATATCGTTC 8820 

GTTTAGCATA AGGAGAGGGG CTTCGGCCCT TCTAA 8855 
(2) INFORMATION FOR 5EQ ID NO: 2s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1289 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: Linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p34 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Glu lie Lys Arg Glu Phe Arg Ala Glu Asp Gly Leu Asp Ala 
15 10 15 

Gly Gly Asp Lys He He Asn Val Ala Leu Ala Asp Arg Thr Val Gly 
20 25 30 

Thr Asp Gly Val Asn Val Asp Tyr Leu He Gin Glu Asn Thr Val Gin 
35 40 45 

Gin Tyr Asp Pro Thr Arg Gly Tyr Leu Lys Asp Phe Val He He Tyr 
50 55 60 

Asp Asn Arg Phe Trp Ala Ala He Asn Asp He Pro Lye Pro Ala Gly 
65 70 75 80 

Ala Phe Asn Ser Gly Arg Trp Arg Ala Leu Arg Thr Asp Ala Asn Trp 
85 90 95 

He Thr Val Ser Ser Gly Ser Tyr Gin Leu Lys Ser Gly Glu Ala He 
100 105 110 

Ser Val Asn Thr Ala Ala Gly Asn Asp He Thr Phe Thr Leu Pro Ser 
115 120 125 

Ser Pro He Asp Gly Asp Thr He Val Leu Gin Asp He Gly Gly Lys 
130 135 140 
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Pro Gly Val Asn Gin Val L u He Val Ala Pro Val Gin Ser He Val 
145 150 155 160 

Asn Pho Arg Gly lu Gin Val Arg Ser Val L u Met Thr Hie Pro Lye 
165 170 175 

Ser Gin Leu Val Leu He Phe Ser Asn Arg Leu Trp Gin Met Tyr Val 
loO 185 i9 0 

Ala Aap Tyr Ser Arg Glu Ala lie Val Val Thr Pro Ala Asn Thr Tyr 



205 



Gin Ala Gin Ser Asn Asp Phe He Val Arg Arg Phe Thr Ser Ala Ala 

215 220 

Pro He Asn Val Lys Leu Pro Arg Phe Ala Asn His Gly Asp H. He 

226 230 235 240 

Asn Phe Val Asp Leu Asp Lys Leu Asn Pro Leu Tyr His Thr He Val 

245 250 25S 

Thr Thr Tyr Asp Glu Thr Thr Ser Val Gin Glu Val Gly Thr His Ser 
260 265 270 

He Glu Gly Arg Thr Ser He Asp Gly Phe Leu Met Phe Asp Asp Asn 
275 280 285 

° 1U 2oq L " U Trp Lou »M? A «P G1 V Asp Ser Lys Ala Arg Leu Arg 

«»" 295 

Ilm He Thr Thr Asn Ser Asn He Arg Pro Asn Glu Glu Val Met Val 
305 310 315 320 

Phe Gly Ala Asn Asn Gly Thr Thr Gin Thr He Glu Leu Lys Leu Pro 
325 330 335 

Thr Asn He Ser Val Gly Asp Thr Val Lys He Ser Met Asn Tyr Met 
340 345 3 S0 

Arg Lys Gly Gin Thr Val Lys lie Lys Ala Ala Asp Glu Asp Lys He 
455 350 3 6 5 

Ala Ser Ser Val Gin Leu Leu Gin Phe Pro Lys Arg Ser Glu Tyr Pro 
"»'° 375 380 

Pro Glu Ala Glu Trp Val Thr Val Gin Glu Leu Val Phe Asn Asp Glu 
385 390 395 * 400 

Thr Asn Tyr Val Pro Val Leu Glu Leu Ala Tyr He Glu Asp Ser Asp 
4 »5 410 41S 

Gly Lys Tyr Trp Val Val Gin Gin Asn Val Pro Thr Val Glu Arg Val 
420 425 430 

Asp Ser Leu Asn Asp Ser Thr Arg Ala Arg Leu Gly Val He Ala Leu 

440 445 

Ala Thr Gin Ala Gin Ala Asn Val Asp Leu Glu Asn Ser Pro Gin Lys 

455 450 

Glu Leu Ala He Thr Pro Glu Thr Leu Ala Asn Arg Thr Ala Thr Glu 
465 470 475 480 

Thr Arg Arg Gly He Ala Arg He Ala Thr Thr Ala Gin Val Asn Gin 
48 5 490 495 

Asn Thr Thr Phe Ser Phe Ala Asp Asp He II n Thr Pro Lys Lys 
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500 



505 



510 



Leu Aen Clu Arg Thr Ala Thr Glu Thr Arg Arg Gly Val Ala Glu lie 
515 520 525 

Ala Thr Gin Gin Glu Thr Aen Ala Gly Thr Asp Asp Thr Thr He II 
530 535 540 

Thr Pro Lye Lys Leu Gin Ala Arg Gin Gly Ser Glu Ser Leu Ser Gly 
545 550 555 560 

He Val Thr Phe Val Ser Thr Ala Gly Ala Thr Pro Ala Ser Ser Arg 
565 570 575 

Glu Leu Aen Gly Thr Aen Val Tyr Aen Lys Aen Thr Asp Asn Leu Val 
580 585 590 

Val Ser Pro Lys Ala Leu Asp Gin Tyr Lys Ala Thr Pro Thr Gin Gin 
595 600 605 

Gly Ala Val He Leu Ala Val Glu Ser Glu Val He Ala Gly Gin Ser 
610 615 620 

Gin Gin Gly Trp Ala Asn Ala Val Val Thr Pro Glu Thr Leu His Lys 
625 630 635 640 

Lys Thr Ser Thr Asp Gly Arg He Gly Leu He Glu He Ala Thr Gin 
645 650 655 

Ser Glu Val Asn Thr Gly Thr Asp Tyr Thr Arg Ala Val Thr Pro Lys 
660 665 670 

Thr Leu Asn Asp Arg Arg Ala Thr Glu Ser Leu Ser Gly He Ala Glu 
675 680 685 

He Ala Thr Gin Val Glu Phe Asp Ala Gly Val Asp Asp Thr Arg He 
690 695 700 

Ser Thr Pro Leu Lys He Lys Thr Arg Phe Asn Ser Thr Asp Arg Thr 
705 710 715 720 

Ser Val Val Ala Leu Ser Gly Leu Val Glu Ser Gly Thr Leu Trp Asp 
725 730 735 

His Tyr Thr Leu Asn He Leu Glu Ala Asn Glu Thr Gin Arg Gly Thr 
740 745 750 

Leu Arg Val Ala Thr Gin Val Glu Ala Ala Ala Gly Thr Leu Asp Asn 
755 760 765 

Val Leu He Thr Pro Lys Lys Leu Leu Gly Thr Lys Ser Thr Glu Ala 
770 775 780 

Gin Glu Gly Val He Lys Val Ala Thr Gin Ser Glu Thr Val Thr Gly 
785 790 795 800 

Thr Ser Ala Asn Thr Ala Val Ser Pro Lys Asn Leu Lys Trp He Ala 
605 810 815 

Gin Ser Glu Pro Thr Trp Ala Ala Thr Thr Ala He Arg Gly Phe Val 
820 825 830 

Lys Thr Ser Ser Gly Ser He Thr Phe Val Gly Asn Asp Thr Val Gly 
835 840 845 

Ser Thr Gin Asp Leu Glu Leu Tyr Glu Lys Asn Ser Tyr Ala Val Ser 



850 



855 



860 
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Pro Tyr Olu Lou Asn Arg Val Leu Ala Aan Tyr Leu Pro Leu Lya Ala 

** ** 875 880 

Ly. Ala Ala Aap Thr Asn Leu Leu Asp Gly Leu Aap Ser Ser Gin Phe 

890 895 

II. Arg Arg Jap He Ala In Thr Val Aan Oly Ser Leu Thr Leu Thr 

905 910 
Gin Gin Thr Aan Leu Ser Ala Pro Leu Val Ser ser Ser Thr Gly Glu 

925 

Ph. Gly Gly Ser Leu Ala Ala Aan Arg Thr Phe Thr He Arg Aan Thr 



940 



Oly Ala Pro Thr S.r lie Val Phe Glu Ly. Gly Pro Ala Ser Gly Ala 

*S5 960 
Aan Pro Ala Gin Ser Met Ser He Arg Val Trp Gly Aan Gin Phe Gly 

Gly Gly Ser Asp Thr Thr Arg Ser Thr Val Phe Glu Val Gly Aap Aap 

990 

Thr s.r Hi. Hi. n . Iyr s.r Gl„ ^ ».„ ly . A . p 6ly A .„ „. 

1000 1005 
Phe JenHe Aan Gly Thr Val Met Pro He Aan lie Asn Ala Ser Gly 



1020 



Leu Met Aan Val Aan Gly Thr Ala Thr Phe Gly Arg Ser Val Thr Ala 

1030 1035 1040 

A-n Gly Glu Phe Il^Ser Lys Ser Ala JenAla Phe Arg Ala HeAen 

Oly Aap Tyr Gly Phe Phe He Arg Aan Aap Ala Ser Aan Thr Tyr Phe 

1065 1070 
Leu Leu Thr Ala Ala Gly Aap Gin Thr Gly oly Phe Aan Gly Leu Arg 

* oeo 1085 

JoV^ " e ASn Jot*" «y Cln «• Thr lie Gly Glu Gly 

1095 1100 

Leslie He Ala Lya GlyVal Thr He Aan Ser Gly Gly Leu Thr Val 

1115 1120 
Aan Ser Arg He Ar^Ser Gin Gly Thr LyaTnr Ser Aap Leu Ty^Thr 

Arg Ala Pro Th^Ser Aap Thr Val Gly^he Trp Ser He AjpHe Aan 

A-p Ser Ala Thr Tyr Aan Gin Phe Pro Gly Tyr Phe Ly. Met Val Glu 

1160 1165 

Lya ThrAen Glu Val Thr Gly Leu Pro Tyr Leu Glu Arg Gly Glu Glu 

1180 

Val^y. ser Pro Gly Thr Leu Thr Gin Phe Gly Aan Thr Leu Aap Ser 

1195 1200 
Leu Tyr Gin Aap Trp He Thr Tyr Pr Thr Thr Pro Glu Ala Arg Thr 

1210 12 is 
Thr Arg Trp Thr Arg Thr Trp Gin Lys Thr Lys Asn Ser Trp S r Ser 
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1220 1225 12 30 

Phe Val Gin Val Phe Asp Gly Gly Asn Pro Pro Gin Pro Ser Asp He 
1235 1240 124S 

° ly lite? u Pro S r A8 P A8 " Ala Thr Gly Asn Leu Thr He Arg 

1Z50 1255 1260 

Asp Phe Leu Arg He Gly Asn Val Arg He Val Pro Asp Pro Val Asn 
1265 1270 1275 128Q 

Lys Thr Val Lys Phe Glu Trp Val Glu 
1285 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ORF X amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

Mat Glu Lys Phe Met Ala Glu He Trp Thr Arg He Cys Pro Asn Ala 
1 5 10 is 

He Leu Ser Glu Ser Asn Ser Val Arg Tyr Lys He Ser He Ala Gly 
20 25 30 

Ser Cys Pro Leu Ser Thr Ala Gly Pro Ser Tyr Val Lys Phe Gin Asp 
35> 40 



Aon Pro Val Gly Ser Gin Thr Phe Arg Arg Arg Pro Ser Phe Lye Ser 
& ° 55 50 

Phe 

65 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 295 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p35 amino acid 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Leu Phe Arg Leu Gin Met He Leu His Gin Leu Leu Leu Leu Val 
1 5 10 15 
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Ph. Met Aen Ser Leu Thr Asn Asn Arg lie Val Ala He Leu Thr Ser 

25 30 

Cly Lys Val Aen Ph Pro Pro Glu Val Val - 

35 a* Ser Tr P Leu A*"9 Thr Ala 

*° 4S 

Cly Thr S r Ala Phe Pro Ser Asn Sor n « t 

50 55 p Ser Ile Leu Ser Arg Phe Asp Val 

60 

Sjr Tyr Ala Ala Phe Tyr Thr Ser Ser Lye * g Ala He Ala Leu clu 

7S 80 
81. V.l ly. L«, s.r *.„ to, L y. s. r „„ ^ ^ „. 

"° *"» V " ISi *' P S " L *" «• »'■> ".1 «y »!. Thr «y Ph e 

105 110 

Pro Arg Arg Thr Tyr Clu Ser Val Clu Cl„ Phe Met Ser Ala Val Cly 



125 



Cly Thr Asn Aen Clu Xle Ala Arg Leu Pro Thr Ser Ala Ala He Ser 

Lys Leu Ser Asp Tyr Asn Leu lie Pro Cly Asp Val Leu Tyr Leu Lys 

155 160 
Ala Cln Leu Tyr Ala Asp Ala Asp Leu Leu Ala Leu Cly Thr Thr Asn 

He Ser He Arg Phe Tyr Asn Ala Ser Asn Cly Tyr He Ser Ser Thr 

185 190 
01» M. «« Ph. tht 01y „„ jj. oly _ trp olu ^ iy> 

tyr g» v.1 y.t pro 01u j Ali val 01y ph# ^ ~ ^ Ma 

3 220 
Arg Thr Ala Cln Ala Cly Cln Cly Cly Met Arg Asn Leu Ser Phe Ser 

235 240 
Glu Val ser Arg Asn Cly Cly He Ser Lys Pro Ala Clu Phe Cly Val 

250 255 
Asn Cly He Arg Val Asn Tyr He cys Clu Ser Ala Ser Pro Pro Asp 



270 



He Met Val Leu Pro Thr Cln Ala Ser Ser Lys Thr Cly Lys Val Phe 

285 



Cly Cln Glu Phe Arg Glu Val 
290 ~ 295 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophag T4 

(vii) IMMEDIATE SOURCE: 
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(B) CLONE: p36 amino acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Ala Asp Leu Lys Val Gly Ser Thr Thr Gly Gly Ser Val He Trp 
1 5 10 15 

His Gin Gly Asn Phe Pro Leu Asn Pro Ala Gly Asp Asp Val Leu Tyr 
20 25 30 

Lys Ser Phe Lys He Tyr Ser Glu Tyr Asn Lys Pro Gin Ala Ala Asp 
35 40 45 

Asn Asp Phe Val Ser Lys Ala Asn Gly Gly Thr Tyr Ala Ser Lys Val 
50 55 60 

Thr Phe Asn Ala Gly He Gin Val Pro Tyr Ala Pro Asn He Met Ser 
65 70 75 80 

Pro Cys Gly He Tyr Gly Gly Asn Gly Asp Gly Ala Thr Phe Asp Lys 
85 90 95 

Ala Asn He Asp He Val Ser Trp Tyr Gly Val Gly Phe Lys Ser Ser 
100 105 HO 

Phe Gly Ser Thr Gly Arg Thr Val Val He Asn Thr Arg Asn Gly Asp 
115 120 125 

He Asn Thr Lys Gly Val Val Ser Ala Ala Gly Gin Val Arg Ser Gly 
130 135 140 

Ala Ala Ala Pro He Ala Ala Asn Asp Leu Thr Arg Lys Asp Tyr Val 
145 150 155 160 

Asp Gly Ala He Asn Thr Val Thr Ala Asn Ala Asn Ser Arg Val Leu 
165 170 175 

Arg Ser Gly Asp Thr Met Thr Gly Asn Leu Thr Ala Pro Asn Phe Phe 
180 185 190 

Ser Gin Asn Pro Ala Ser Gin Pro Ser His Val Pro Arg Phe Asp Gin 
195 200 205 

He Val He Lys Asp Ser Val Gin Asp Phe Gly Tyr Tyr 
210 21S 220 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1026 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p37 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Ala Thr L u Lys Gin II Gin Phe Lys Arg Ser Lys He Ala Gly 
15 10 15 
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Thr Arg Pr Ala Ala Ser Val Leu Ala Glu «y oiu Leu Ala 11. Aen 

I-u Ly. Asp Arg Thr II Phe Thr Lys Aep Asp ser Gly Aen He He 

Aep Leu Oly Phe Ala Lys Gly Gly Gin Val Asp Gly Asn Val Thr lie 

Ajn Gly Leu Leu Arg Leu Asn Gly Asp Tyr Val Gin Thr Gly Gly Met 

7S * 80 

Thr Val Asn Gly Pro lie Oly Ser Thr Asp Gly Val Thr Gly Lys He 

90 95 

Ph. Arg ser Thr Gin Gly Ser Phe Tyr Ala Arg Ala Thr Asn Asp Thr 

iU5 no 
Ssr Asn Ala His Leu Trp Phe Glu Asn Ala Asp Gly Thr Glu Arg Gly 

Val xij Tyr Ala Arg Pro Gin Thr Thr Thr Asp Gly Glu He Arg Leu 

*«" 140 

Arg V.1 Arg Gin Gly Thr Gly Ser Thr Ala Asn Ser Glu Phe Tyr Phe 

155 160 

Arg Ser lie Asn Gly Gly Glu Phe Gin Ala Asn Arg He Leu Ala Ser 

170 175 

Asp ser Leu Val Thr Lys Arg He Ala Val Asp Thr Val He His Asp 



190 



Ala Lys Ala Phe Gly Gin Tyr Asp Ser His ser Leu Val Asn Tyr Val 



205 



Tyr Pro Gly Thr Gly Glu Thr Asn Gly Val Asn Tyr Leu Arg Lys Val 

220 

Arg Ala Lys Ser Gly Gly Thr He Tyr His Glu He Val Thr Ala Gin 

230 235 240 

Thr Gly Leu Ala Asp Glu Val Ser Trp Trp Ser Gly Asp Thr Pro Val 

250 255 
Ph. Lys Leu Tyr Gly He Arg Asp Asp Gly Arg Met He He Arg Asn 

265 270 
Ser Leu Ala Leu Gly Thr Phe Thr Thr Asn Phe Pro ser Ser Asp Tyr 

285 

Gly Asn Val Cly Val Met Gly Asp Lys Tyr Leu Val Leu Gly Asp Thr 



300 



MS 61y ^ Ser S r L * 8 ^ Th * Val Phe Asp Leu Val Gly 

310 315 

Gly Gly Tyr Ser Val Ala Ser He Thr Pro Asp Ser Phe Arg Ser Thr 
25 330 33S 

Arg Lys Gly lie Ph. Gly Arg Ser Glu Asp Gin Gly Ala Thr Trp He 

345 350 

Met Pro Gly Thr Asn Ala Ala Leu Leu Ser Val Gin Thr Gin Ala Asp 
*" 360 355 * 

Asn Asn Asn Ala Cly Asp Cly Gin Thr His II cly Tyr Asn Ala Cly 
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370 375 380 

Cly Lye Met Asn His Tyr Phe Arg Gly Thr Gly Gin Met Asn He Asn 
385 390 395 400 

Thr Gin Gin Gly Met Glu II Aen Pro Gly He Leu Lys Leu Val Thr 
405 410 415 

Gly Ser Asn Asn Val Gin Ph Tyr Ala Asp Gly Thr He Ser Ser He 
420 425 430 

Gin Pro He Lys Leu Asp Asn Glu He Phe Leu Thr Lys Ser Asn Asn 
435 440 445 

Thr Ala Gly Leu Lys Phe Gly Ala Pro Ser Gin Val Asp Gly Thr Arg 
450 455 460 

Thr He Gin Trp Asn Gly Gly Thr Arg Glu Gly Gin Asn Lys Asn Tyr 
465 470 475 480 

Val He He Lys Ala Trp Gly Asn Ser Phe Asn Ala Thr Gly Asp Arg 
485 490 495 

Ser Arg Glu Thr Val Phe Gin Val Ser Asp Ser Gin Gly Tyr Tyr Phe 
500 505 510 

Tyr Ala His Arg Lys Ala Pro Thr Gly Asp Glu Thr He Gly Arg He 
515 520 " 525 

Glu Ala Gin Phe Ala Gly Asp Val Tyr Ala Lys Gly He He Ala Asn 
530 535 540 

Gly Asn Phe Arg Val Val Gly Ser Ser Ala Leu Ala Gly Asn Val Thr 
545 550 555 560 

Met Ser Asn Gly Leu Phe Val Gin Gly Gly Ser Ser He Thr Gly Gin 
565 570 575 

Val Lys He Gly Gly Thr Ala Asn Ala Leu Arg He Trp Asn Ala Glu 
580 585 590 

Tyr Gly Ala He Phe Arg Arg Ser Glu Ser Asn Phe Tyr He He Pro 
595 600 60S 

Thr Asn Gin Asn Glu Gly Glu Ser Gly Asp He His Ser Ser Leu Arg 
610 615 620 

Pro Val Arg He Gly Leu Asn Asp Gly Met Val Gly Leu Gly Arg Asp 
625 630 635 640 

Ser Phe He Val Asp Gin Asn Asn Ala Leu Thr Thr He Asn Ser Asn 
645 650 655 

Ser Arg He Asn Ala Asn Phe Arg Met Gin Leu Gly Gin Ser Ala Tyr 
660 665 670 

He Asp Ala Glu Cys Thr Asp Ala Val Arg Pro Ala Gly Ala Gly Ser 
675 680 685 

Phe Ala Ser Gin Asn Asn Glu Asp Val Arg Ala Pro Phe Tyr Met Asn 
690 695 700 

He Asp Arg Thr Asp Ala Ser Ala Tyr Val Pro He Leu Lys Gin Arg 
705 710 715 720 

Tyr Val Gin Gly Asn Gly Cys Tyr Ser Leu Gly Thr Leu He Aen Asn 
725 730 735 
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«ly A.„ Ph. »j, v.! Hi. Tyr ^ „ My „ y A<p ifct 

'* 5 7S0 
«y Pr «n Thr AX. A . p Ph „ y Trp 01u phe Ue ^ Xsn oly A.p 

Z'o S ' C **■ A " P & » «V Val p„. A . p ^ 

«y «n .1. Thr «y oly ,. r 01y A .„ A>n ^ ^ 

800 

° l " l "' Tta *• "« *" *r s.r Tyr „. oly 

Ala Pro He p ro Trp Pro Ser Asp Ser Val p rn 

820 H o,f vaA Pro AIa Oly Phe Ala Leu 

82S 830 

M«t Clu Gly Gin Thr Phe Asp Lys Ser Ala is,- » 

83S p JJJ Ser Ala ^ Pro Lys Leu Ala Val 

845 

Pro «- v " U' s p " *«p »« *** «y ci„ , hr xx. ly . 

860 

a "* SJ ™ — — JU OXU AX. A.p oi y v.X 

875 880 
Lys Ala His Ser His Ser Ala Ser Ala Ser <!o^ -ru . 

885 ia Itn V Thr A8 P Lau °ly Thr 

890 895 

Lys Thr Thr Ser Ser Phe Asp Tyr Glv Thr ft,. 

900 P y go? Tnr Lys Oly Thr Asn Ser Thr 

905 910 
Oly «y «. Thr „ t . s . t cl y jg C1 y S . r Thr s . r „ ? ^ wu 

Hi. £r Hi. Tyr .1. cXu »1. Trp A .„ Gly Ihr „ y ^ 

»« "~ ~ Iyr & "* S « s„ A .„ Ihr 

955 960 
"* "* "* « H " S " Hi ' ??■ »r P h . Gly T hr ^ ser 

Ala Gly Asp His Ser His Ser Val Cly He Glv lla u< 

980 oo£ Ae Ala His Thr His Thr 

»•! «. XX. «y S.r Hi. Cly Hi. Thr XX. Thr y.x A.„ s.r Thr CXy 

1000 1005 
».n TteCXu A.„ Thr V.X Ly.^.n XX. AX. Ph. J.gTyr ,1. V.X Ar, 



Lou Ala 
1025 
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What is slaiasd is: 

1. An isolat d p lyp ptide consisting essentially 
of the gp37 tail fiber protein f bacteriophag T4 lacking 
5 amino acids 99-496 (SEQ ID NO: 6) when numbered from the amino 
terminus, wherein said polypeptide has the capability to form 
dimers and interact with the P36 protein oligomer of 
bacteriophage T4. 

10 2. An isolated polypeptide consisting essentially 

of a fusion protein between the gp36 and gp37 proteins of 
bacteriophage T4 f wherein amino acid residues 1-242 of gp37 
(SEQ ID NO: 6) are fused in proper reading frame to amino acid 
residues 118-221 of gp36 (SEQ ID NO:5). 

15 

3. The polypeptide of claim 2 wherein a plurality 
of car boxy termini of said polypeptide have the capability f 
interacting with the amino terminus of the P37 protein 
oligomer of bacteriophage T4 and to form an attached oligomer 

20 and the amino termini of the oligomer of said polypeptide 
have the capability of interacting with the carboxy termini 
of gp36 polypeptides of bacteriophage T4. 

4. An isolated polypeptide oligomer consisting 
25 essentially of two gp37 polypeptides of bacteriophage T4, 

wherein the amino termini of said oligomer lack the 
capability of interacting with the carboxy termini of gp36 
polypeptides of bacteriophage T4. 

30 5. An isolated polypeptide oligomer consisting 

essentially of the P37 protein of bacteriophage T4, wherein 
the amino termini of said oligomer lack the capability of 
interacting with the carboxy termini of gp36 polypeptides of 
bacteriophage T4. 

35 

6. An is lated p lypeptide c nsisting ssentially 
of a variant of the gp36 pr tein of bacteriophage T4, wh r in 
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•aid P lypeptide lacks the capability of interacting with the 
amxno terminus of th P37 protein oligomer of bacteriophage 

* 7. An isolated polypeptide consisting essentially 

of a fusion protein between the gp 3 6 and gp34 proteins of 
bacteriophage T4, wherein amino acid residues 1-73 of gp36 
(SBQ ID NO: 5) are fused in proper reading frame 
amino-terminal to amino acid residues 866-1289 of gp34 (SEQ 



8. An oligomer of the polypeptide of claim 7, 
wherein the amino termini of said dimer have the capability 
of interacting with the gp35 protein of bacteriophage T4. 

9. An isolated polypeptide consisting essentially 
of a variant of the gp35 protein of bacteriophage T4, wherein 
said polypeptide forms an angle of less than about 125- when 
combxned with the P34 and P36-P37 protein oligomers of 

20 bacteriophage T4, under conditions wherein the wild-type gp35 
protein forms an angle of 137* when combined with said 
oligomers . 

„ - 10 * to is °lated polypeptide consisting essentially 

25 of . varxant of the gp 3 5 protein of bacteriophage T4, wherein 
said polypeptide forms an angle of more than about 14 5 • when 
combined with the P34 and P36-P37 protein oligomers of 
bacteriophage T4, under conditions wherein the wild-type gp35 
protexn forms an angle of 137- when combined with said 
30 oligomers. 

11. An isolated polypeptide consisting essentially 
of a variant of the gp 3 5 protein of bacteriophage T4, wherein 
the xnteraction of said polypeptide with the P34 protein 
35 oligomer of bacteriophag T4 is unstabl at temperatur s 
between about 40«c and about 60 c. 
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12* An isolated polypeptid olig mer c nsisting 
ssentially f a variant f the P37 protein of bacteriophage 
T4, wherein th interaction of said ligomer with the P36 
protein ligom r of bacteri phag T4 is unstabl at 
S temperatures between about 40 °C and about 60 °C. 

13. An isolated polypeptide oligomer consisting 
essentially of a variant of the P37 protein of bacteriophage 
T4, wherein the carboxy-terminal domain of said oligomer is 

10 modified so as to confer the ability of the entire 

polypeptide to bind specifically to an immobilized ligand. 

14. The polypeptide of claim 13, wherein said 
ligand is selected from the group consisting of biotin, 

IS immunoglobulin, or divalent metal ions. 

15* A nanostructure comprising a plurality of 
fusion proteins, said fusion proteins comprising a first 
portion consisting of at least the first 10 N-terminal amino 

20 acids of a tail fiber protein fused via a peptide bond to a 
second portion consisting of at least the last 10 C-terminal 
amino acids of a second tail fiber protein, wherein the tail 
fiber proteins are selected from the group consisting of 
gp34, gp35, gp36, and gp37 proteins of a T-even-like 

25 bacteriophage, wherein the first and second tail fiber 
proteins are the same or different. 

16. The nanostructure of claim 15, wherein the 
first and second tail fiber proteins are different. 

17. The nanostructure of claim 15, which further 
comprises a molecule that can self-assemble into a dimer or 
trimer, fused to at least a 10 amino acid portion of a 
T-even-like tail fiber protein. 

35 

18. The nanostructure of claim 17, wherein the 
mol cule has the structure of a leucine zipper. 
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19. The nanostructure of claim 15, wherein said 
nan structure comprises a lin ar n -dim nsional rod. 

20. The nanostructure of claim 15, wherein said 
5 nanostructure comprises a polygon. 

21. The nanostructure of claim 15 # wherein said 
nanostructure comprises a three-dimensional cage or solid. 

10 22. The nanostructure of claim 15, wherein said 

nanostructure comprises a two-dimensional open or closed 
sheet. 

23. An isolated fusion protein consisting 

15 essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10-60 
N- terminal amino acids of the gp37 protein fused to a second 
portion of a gp36 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 c- terminal amino acids 

20 of the gp36 protein. 

24. An isolated fusion protein consisting 
essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10 N-terminal 

25 amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-like bacteriophage consisting of 
at least the last 10 C-terminal amino acids of the gp36 
protein. 

30 25. An isolated fusion protein consisting 

essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 20 N-terminal 
amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-like bacteriophage consisting of 

35 at least the last 20 C-terminal amin acids f the gp36 
protein. 
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26. An isolat d fusion protein consisting 

ess ntially of a portion f a gp36 pr tein of a T-ev n-like 
bacteriophag consisting of at least the first 10-60 
N- terminal amino acids f th gp36 protein fus d to a second 
5 portion of a gp34 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 C- terminal amino acids 
of the gp34 protein. 

27. An isolated protein comprising at least 20 
10 contiguous amino acids of the gp37, gp36, or gp34 protein of 

a T-even-like bacteriophage, and lacking at least 5 amino 
acids of the amino- or carboxy-terminus of the protein. 

28. An isolated DNA encoding the polypeptide of 

15 claim 1. 

29. An isolated DNA encoding the polypeptide of 

claim 2. 

20 30. An isolated DNA encoding the polypeptide of 

claim 4. 

31. An isolated DNA encoding the polypeptide of 

claim 5. 

25 

32. An isolated DNA encoding the polypeptide of 

claim 6. 

33. An isolated DNA encoding the polypeptide of 

30 claim 7. 

34. An isolated DNA encoding the polypeptide of 

claim 9. 

35 35. An is lated DNA encoding th polypeptide f 

claim 10. 

• 58 - 



3NSD0CJD: <WO 961 1947A1 JA> 



W 96/11947 



PCT/DS95/13023 



36. An isolated DMA encoding the polypeptide of 

claim 11. 

37. An isolated DNA encoding the polypeptide of 

5 claim 12. 

38. An isolated DNA encoding the polypeptide of 

claim 13* 

10 39. An isolated DNA encoding the protein of claim 

23. 



25. 

15 

26. 



20 27. 



40. An isolated DNA encoding the protein of claim 



41. An isolated DNA encoding the protein of claim 



42. An isolated DNA encoding the protein of claim 



43. A method for making a polygonal nanostructure 
comprising contacting the protein of claim 26 with purified 
gp3S proteins of a T-even-like bacteriophage. 

25 

44. A method for making a nanostructure comprising 
contacting a plurality of the proteins of claim 23 with each 
other • 

30 45. A kit comprising in one or more containers the 

fusion protein of claim 23. 

46. A kit comprising in one or more containers the 
fusion protein of claim 25. 

35 

47. A kit comprising in on or more containers the 
fusion pr tein f claim 26. 
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48. A kit comprising in ne or more containers the 
fusion protein of claim 26, and an is lated gp35 prot in of a 
T- ven-lik bacteriophag • 

5 49. The protein of claim 23 wherein the T-even- 

like bacteriophage is T4. 

50. The protein of claim 26 wherein the T-even- 
like bacteriophage is T4. 

10 

51. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the P37 protein 
oligomer of bacteriophage T4 is unstable at temperatures 

15 between about 40 °C and about 60 °c. 

52. An isolated polypeptide consisting essentially 
of a variant of the gp3 6 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the gp35 protein of 

20 bacteriophage T4 is unstable at temperatures between about 
40°C and about 60°C. 

53. An isolated polypeptide consisting essentially 
of a variant of the gp34 protein of bacteriophage T4, wherein 

25 the interaction of said polypeptide with the gp35 protein of 
bacteriophage T4 is unstable at temperatures between about 
40°C and about 60°C. 



30 



35 
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I 10 I 20 | 30 

1 TAGGAGCCCG GGAGAATGGC CGAGATTAAA 

61 GCAGGTGGTG ATAAAATAAT CAACGTAGCT 

121 GTTAACGTTG ATTACTTAAT TCAAGAAAAC 

181 TATTTAAAAG ATTTTGTAAT CATTTATGAT 

241 CCAAAACCAG CAGGAGCTTT TAATAGGGGA 

301 TGGATTAOGG TTTCATCTGG TTCATATCAA 

361 ACGGCAGCTG GAAATGACAT CAGGTTTACT 

421 ATGGTTCTOC AAGATATTGG AGGAAAAGCT 

481 GTACAAAGTA TTGTAAACTT TAGAGGTGAA 

541 AAGTCACAGC TAGTTTTAAT TTTTAGTAAT 

601 AGTAGAGAAG CTATAGTTGT AACACCAGCG 

661 ATOGTACGTA GATTTACTTC TGCTGCACCA 

721 CATGGCGATA TTATTAATTT CGTOGATTTA 

781 GTTACTACAT ACGATGAAAC GACTTCAGTA 

841 CGTACATCGA TTGACGGTTT CTTGATGTTT 

901 GACGGGGATA GTAAAGCGCG TTTACGTATC 

961 GAAGAAGTTA TGGTATTTGG TGCGAATAAC 

1021 CCAACTAATA TTTCTGTTGG TGATACTGTT 

1081 CAAACAGTTA AAATCAAAGC TGCTGATGAA 

1141 CAATTGCCAA AACGCTCAGA ATATOCACCT 

1201 GTTTTTAAOG ATGAAACTAA TTATGTTGCA 

1261 GATGGAAAAT ATTGGGTTGT ACAGCAAAAC 

1321 AATGATTCTA CTAGAGCAAG ATTAGGCGTA 

1381 GTGGATTTAG AAAATTCTCC ACAAAAAGAA 

1441 CGTACTGCTA CAGAAACTCG CAGAGGTATT 

1501 CAGAACACCA CATTCTCTTT TGCTGATGAT 

1561 AGAACTGCTA CAGAAACTCG TAGAGGTGTC 

1621 GCAGGAACCG ATGATACTAC AATCATCACT 

1681 GAATCATTAT CTGGTATTGT AACCTTTGTA 

1741 OGTGAATTAA ATGGTACGAA TGTTTATAAT 
1801 AAAGCTTTGG ATCAGTATAA AGCTACTCCA 

1861 GAAAGTGAAG TAATTGCTGG ACAAAGTCAG 
1921 GAAAOGTTAC ATAAAAAGAC ATCAACTGAT 
1981 CAAAGTGAAG TTAATACAGG AACTGATTAT 
2041 GACOGTAGAG CAACTGAAAG TTTAAGTGGT 
2101 GACGCAGGCG TCGAOGATAC TCGTATCTCT 
2161 AGTACTGATC GTACTTCTGT TGTTGCTCTA 
2221 GACCATTATA CACTTAATAT TCTTGAAGCA 
2281 GCTACGCAGG TCGAAGCTGC TGCGGGAACA 



| 40 | 50 | 60 

AGAGAATTCA GAGCAGAAGA TGGTCTGGAC 60 

TTAGCTGATC GTACCGTAGG AACTGACGGT 120 

ACAGTTCAAC AGTATGATCC AACTCGTGGA 180 

AACCGCTTTT GGGCTGCTAT AAATGATATT 240 

CGCTGGAGAG CATTACGTAC CGATGCTAAC 300 

TTAAAATCTG GTGAAGCAAT TTCGGTTAAC 360 

TTACCATCTT CTCCAATTGA TGG7GATACT 420 

GGAGTTAAOC AAGTTTTAAT TGTAGCTCCA 480 

CAGGTACGTT CAGTACTAAT GACTCATCCA 540 

CGTCTGTGGC AAATGTATGT TGCTGATTAT 600 

AATACTTATC AAGCGCAATC CAACGATTTT 660 

ATTAATGTCA AACTTCCAAG ATTTGCTAAT 720 

GATAAACTAA ATCCGCTTTA TCATACAATT 780 

CAAGAAGTTG GAACTCATTC CATTGAAGGC 840 

GATGATAATG AGAAATTATG GAGACTGTTT 900 

ATAACGACTA AnCAAACAT TCGTCCAAAT 960 

GGAACAACTC AAACAATTGA GCTTAAGCTT 1020 

AAAATTTCCA TGAATTACAT GAGAAAAGGA 1080 

GATAAAATTG CTTCTTCAGT TCAATTGCTG 1140 

GAAGCTGAAT GGGTTACAGT TCAAGAATTA 1200 

GTTTTGGAGC TTGCTTACAT AGAAGATTCT 1260 

GTTCCAACTG TAGAAAGAGT AGATTCTTTA 1320 

ATTGCTTTAG CTACACAAGC TCAAGCTAAT 1380 

TTAGCAATTA CTCCAGAAAC GTTAGCTAAT 1440 

GCAAGAATAG CAACTACTGC TCAAGTGAAT 1500 

ATTATCATCA CTCCTAAAAA GCTGAATGAA 1560 

GCAGAAATTG CTACGCAGCA AGAAACTAAT 1620 

CCTAAAAAGC TTCAAGCTOG TCAAGGTTCT 1660 

TCTACTGCAG GTGCTACTCC AGCTTCTAGC 1740 

AAAAACACTG ATAATTTAGT TGTTTCACCT 1800 

ACACAGCAAG GTGCAGTAAT TTTAGCAGTT 1860 

CAAGGATGGG CAAATGCTGT TGTAACGCCA 1920 

GGAAGAATTG GTTTAATTGA AATTGCTACG 1980 

ACTOGTGCAG TCACTCCTAA AACTTTAAAT 2040 

ATAGCTGAAA TTGCTACACA AGTTGAATTC 2100 

ACACCATTAA AAATTAAAAC CAGATTTAAT 2160 

TCTGGATTAG TTGAATCAGG AACTCTCTGG 2220 

AATGAGACAC AACGTGGTAC ACTTCGTGTA 2280 

TTAGATAATG TTTTAATAAC TCCTAAAAAG 2340 
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2341 CTTTTAGGTA CTAAATCTAC TGAAGGGCAA 

2401 GAAACTGTGA CTGGAACGTC AGCAAATACT 

2461 GGGCAGAGTG AACCTACTTG GGCAGCTACT 

2521 TCTGGTTCAA TTACATTCGT TGGTAATGAT 

2581 TATGAGAAAA ATAGCTATGC GGTATCACCA 

2641 TTGCCACTAA AAGCAAAAGC TGCTGATACA 

2701 TTCATTCGTA GGGATATTGC ACAGACGGTT 

2761 AATCTGAGTG CCCCTCTTGT ATCATCTAGT 

2821 AATAGAACAT TTACCATCCG TAATACAGGA 

2881 CCTGCATCCG GGGCAAATCC TGCACAGTCA 

2941 GGGGGOGGTA GTGATACGAC GGGTTCGACA 

3001 CACTTTTATT CTCAACGTAA TAAAGAGGGT 

3061 ATGCCAATAA ACATTAATGC TTCOGGTTTG 

3121 CGTTCAGTTA CAGCCAATGG TGAATTCATC 

3181 AAGGGTGATT AOGGATTCTT TATTGGTAAT 

3241 GCAGCCGGTG ATCAGACTGG TGGTTTTAAT 

3301 TCCGGTCAGA TTACAATTGG TGAAGGCTTA 

3361 GGGGGTTTAA CTGTTAACTC GAGAATTOGT 

3421 ACCCGTGOGC CAACATCTGA TACTGTAGGA 

3481 ACTTATAACC AGTTCCCGGG TTATTTTAAA 

3541 CTTCCATACT TAGAACGTGG OGAAGAAGTT 

3601 AACACACTTG ATTCGCTTTA CCAAGATTCG 

3661 ACCACTCGCT GGACACGTAC ATGGCAGAAA 

3721 GTATTTGAGG GAGGTAACCC TCCTCAACCA 

3781 GCTACAATGG GGAATCTTAC TATTOGTGAT 

3841 CCTGAGCCAG TGAATAAAAC GGTTAAATTT 

3901 AATTTATGGC CGAGATTTGG ACAAGGATAT 

3961 CAGTAAGATA TAAAATAAGT ATAGGGGGTT 

4021 ATGTTAAATT TCAGGATAAT CCTGTAGGAA 

4081 AGAGTTTTTG ACCCTTCCAC CGGAGCATTA 

4141 TCAAATGATA CTACATCAGC TGCTTTTGTT 

4201 AATTGTTGCT ATATTAACTA GTGGAAAGGT 

4261 AAGAACCGCC GGAAOGTCTG CCTTTCCATC 

4321 ATATGCTGCT TTTTATACTT CTTCTAAAAG 

4381 TAATAGAAAA AGCACASATG ATTATCAAAC 

4441 AGATGTAGGA GCTACCGGGT TTCCAAGAAG 

4501 GGCAGTTGGT GGAACTAATA ACGAAATTGC 
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4741 AAAGGAAGAT TATGTAGTTG TTCCAGAAAA 

4801 AACTGCACAA GCTGGCCAAG GTGGCATGAG 

4861 TGGCGGCATT TCGAAACCTG CTGAATTTGG 

4921 CGAATCGGCT TCACCTCCGG ATATAATGGT 

4981 TAAAGTGTTT GGGCAAGAAT TTAGAGAAGT 

5041 TTCTTTATAA ATACTATTCA AATAAAGGGG 

5101 ACAACTGGAG GCTCTGTCAT TTGGCATCAA 

5161 GATGTACTCT ATAAATCATT TAAAATATAT 
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5281 GGCATTCAAG TCCCATATGC TCCAAACATC 

5341 GGTGATGGTG CTACTTTTGA TAAAGCAAAT 

5401 TTTAAATGGT CATTTGGTTC AACAGGCOGA 

5461 ATTAACACAA AAGGTGTTGT GTCGGCAGCT 
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6001 GACCCATTGG TTCTACTGAT GGGGTCACTG 
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6781 CTGAGGACCA AGGCGCAACT TGGATAATGC 
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7081 CTAAATCTAA TAATACTGCG GGTCTTAAAT 
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PROTEIN STRUCTURES AND PROTEIN FIBRES 

This invention relates to protein structures, to methods of producing those protein structures, 
and to protein fibres and other materials and assemblies produced using those protein 
structures. 

The process of molecular self-assembly is central to all biological systems and is assuming 
increasing importance and application in biotechnology (L. Q. Gu, et al (1999) Nature 398, 
686) and nanotechnology (K. E. Drexler, (1999) TIBTECH 17, 5). The characterization of 
natural biomolecular assemblies motivates and directs the development of model 
self-assembling systems and, in turn, these advance our understanding of biology. For 
proteins at least, the coiled coil is arguably the simplest self-assembling system. Coiled coils 
are protein-folding motifs that direct and cement a wide variety of protein-protein interactions 
(A. Lupas, (1996) Trends Biochem. Sci 21, 375). In structural terms, coiled coils are 
relatively straightforward: they are a-heiical bundles with between 2 and 5 strands that can be 
arranged in parallel, antiparallel or mixed topologies. The basic sequence features that guide 
the formation of coiled coils from peptides are reasonably well understood (P. B. Harbury et 
al (1993) Science 262, 1401; D. N. Woolfson and T. Alber (1995) Protein Sci. 4, 1596). For 
instance, most coiled-coil sequences are dominated by a 7-residue repeat of hydrophobic (H) 
and polar (P) residues, (HPPHPPP)„, known as the "heptad repeat". When configured into an 
a-helix this pattern gives an amphipathic structure, the hydrophobic face of which directs 
oligomer-assembly. Furthermore, both the number and the direction of chains within a 
coiled-coil bundle is determined predominantly by residues that form or flank the hydrophobic 
core namely, residues at the first, fourth, fifth and seventh positions of the heptad repeat. For 
instance, coiled coils which form dimers (i.e. two-stranded assemblies) usually have 
isoleucine or valine residues at the first position and a leucine residue at the fourth position. 
By contrast, coiled coils that form trimers (i.e. three-stranded assemblies) often have the same 
residues (i.e both isoleucine or both leucine) at both "H" positions. Finally, hetero-oligomers 
(that is coiled coils made from strands with different amino-acid sequences) may be directed 
by complementary charged interactions that flank the hydrophobic core. For these reasons, 
there have been a number of successful de novo protein designs based on the coiled coil 
These include some ambitious structures that extend the natural repertoire of coiled-coil 
motifs (S. Nautiyal et al (1995) Biochemistry 34, 11645; A. Lombardi et al (1996) 
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Biopolymers 40, 495; D. H. Lee et al (1996) Nature 382, 525; P. B. Harbury et al (1998) 
Science 2S2, 1462; J. P. Schneider et al (1998) Folding Des. 3, R29). 

In addition to commonly accepted structures with a single, contiguous heptad repeat, the 
inventors have identified sequences with multiple, offset heptad repeats which help explain 
oligomer-state specification in coiled coils. For example, sequences with two heptad repeats 
offset by two residues; i.e a/f-b/g'-c/a'-d/b'-e/c'-f/d'-g/e' set up two hydrophobic seams on 
opposite sides of the helix formed. Such helices may combine to bury these hydrophobic 
surfaces in two different ways and form two distinct structures: open "a-sheets" and closed 
"a-cylinders'\ 

Other relevant aspects of coiled-coil structure are described in W099/1 1774, the disclosure of 
which is incorporated herein by way of reference. 

This understanding of coiled coils, and the resulting protein designs, centres on short 
structures as exemplified by the leucine-zipper motifs (E. K. O'Shea et al (1989) Science 243, 
538; E. K. O'Shea et al (1991) Science 254, 539), which are found in a variety of transcription 
factors. In contrast, most natural coiled coils extend over hundreds of amino acids (A. Lupas 
(1996) supra; J. Sodek et al (1972) Proc. Natl Acad. ScL U.S.A 69, 3800) and many 
assemble further to form thicker, multi-stranded filaments (H. Herrmann and U. Aebi (1998) 
Curr. Opin. Struct. Biol 8, 177). 

With the goal of making elongated structures to improve our understanding of coiled coils, 
and to develop protein-design studies, we initially designed two 28-residue peptides — 
dubbed Self- Assembling Fibre peptides, SAF-pl and SAF-p2 — to fold and form extended 
fibres when mixed. Focusing on the buried, hydrophobic-core positions of the structure, rules 
were incorporated to direct parallel dimer formation and to guard against alternative 
oligomers and topologies (P. B. Harbury et al (1993) supra; D. N. Woolfson and T. Alber 
(1995) supra; L. J. Gonzalez et al (1996) Nature Struct Biol 3, 101 1). The building block of 
the design was a staggered heterodimer with overhanging or "sticky" ends. This contrasts 
with and distinguishes it from the natural and designer coiled-coil assemblies that have been 
characterized to date, in which the polypeptide strands align in-register, i.e they have blunt or 
"flush" ends. Complementary core interactions and flanking ion-pairs were incorporated into 
the overhangs to facilitate longitudinal association of the heterodimers (Figs. 1&2). This 
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principle of using "sticky ends" is well developed in molecular biology for assembling DNA 
(S. J. Palmer et al (1998) Nucleic Acids Res. 26, 2560), and has been used to design intricate 
DNA crystals (E. Winfree et al (1998) Nature 394, 539). However, to our knowledge, our 
application of sticky end-directed molecular assembly to peptides is new; although we do note 
that head-to-tail packing of helices has been observed in recently solved crystal structures for 
two designer peptides (N. L. Ogihara et al (1997) Protein ScL 6, 80; G. G. Prive et al (1999) 
Protein Sci. 8, 1400). These were helical peptides that crystallised with their helical ends in 
- contact so as to form pseudo-continuous helices in the solid state. In other words they formed 
"blunt-ended" arrangements. 

According to one aspect of the invention there is provided a protein structure comprising a 
plurality of first peptide monomer units arranged in a first strand and a plurality of second 
peptide monomer units arranged in a second strand, the strands preferably forming a coiled- 
coil structure, and in which a first peptide monomer unit in the first strand extends beyond a 
corresponding second peptide monomer unit in the second strand in the direction of the 
strands. The protein structures of the invention have numerous advantages. For example, 
relatively long protein fibres can be formed with little material - 1 pi of a 100 jiM solution of 
the peptide monomers may provide enough material to form 10 m of fibre 50 nm thick. 

At least one charged amino acid residue of the first peptide monomer unit may be arranged to 
attract an oppositely-charged amino acid residue of the second peptide monomer unit. 
Preferably, the charged amino acid residue is in an end portion of the first peptide monomer 
unit, which extends beyond the corresponding second peptide monomer unit in the second 
strand. At least one strand may consist solely of first or second peptide monomer units 
respectively i.e homogenous strands. Heterologous strands are also contemplated. The 
peptide monomer units may comprise a repeating structural unit. Preferably, the repeating 
structural unit comprises a heptad repeat motif, having the pattern: 

hpphppp 
abcdef g 

Preferably, the repeat may include isoleucine or asparagine at position a and leucine at 
position d. Other repeats (e.g hendecads - abcdefghijk) and amino acid compositions may 
also be used (see WQ99/1 1 774). 
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Preferably, the heptad repeat comprises oppositely-charged residues at positions e and g 
respectively. The oppositely-charged residues may be, for example, glutamic acid and lysine 
residues or arginine and aspartic acid. The use of synthetic amino acids, such as ornithine is 
also envisaged. 

A protein structure in accordance with the invention may be also specified by pairs of 
asparagine residues in the "a" positions provided by corresponding first and second peptide 
monomer units. 

In a preferred protein structure, the first and second peptide monomer units have the following 
sequences: 

a) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC) and 

b) KI RALKAKNAHLKQE I AALEQE I AALEQ (SAF-p2D) respectively; or 

c) KIAALKQKIAALKQEIDALEYENDALEQ (SAF-pl A) and 

d) KI RALKWKNAHLKQE I AALEQE I AALEQ ( SAF~p2C) respectively; or 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC) and 

f) KI RALKWKNAHLKQE I AALEQE I AALEQ (SAF-p2C) respectively. 

It will be appreciated that these are examples only of 4-heptad structures and that other 
lengths are possible and envisaged for use in the invention. 

According to another aspect of the invention, there is provided a method of producing protein 
structures, the method comprising providing a mixture of first and second peptide monomer 
units which associate to form a protein structure according to the invention. The structure can 
be derivatised and/or stabilized by cross-linking. 

Derivatization of the peptide monomer units before or after assembly into the protein 
structures of the invention may be performed. For example, fluorescent moieties 
(fluorophores) may be attached to the coiled coil as described in W099/1 1774. The addition 
of fluorescent moieties may assist visualization of the protein structure. Substitution with - 
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functional groups at the "f ' position in the heptad repeat is especially preferred as that 
position is on the outside of the helix (see Fig. 1C and IE). Other derivatives may include 
attaching binders to the peptide monomer units for example so that units which can bind other 
entities can be produced. 

The first and second peptide monomers and the strands may have the characteristics described 
above. 

The invention also provides protein fibres produced by an association of protein structures 
according to the invention. 

The protein structures may also be arranged to 'form tubular structures. In particular, the 
structures may be arranged to form nanotubes. 

According to another aspect of the invention, there is provided a kit for making protein 
structures, the kit comprising first and second peptide monomer units which associate to form 
a protein structure or protein fibres according to the invention. 

The protein structures of the invention may be assembled in two and three dimensional arrays. 
For example, two dimensional mats can be formed which can function, for example as filters. 
Three dimensional grids or matrices can also be formed again, for example, for use as sieves 
or filters or for organising other associated or conjugated molecules in three dimensions. 

In a preferred embodiment, a matrix is assembled in situ. For example, a matrix can be 
formed in a solution to entrap contaminants in the solution and then the matrix, together with 
contaminants, can be removed from the solution for example by centrifugation. 

The stability of the protein structures at higher temperatures may be improved by making the 
peptide monomers longer, such that the overlap between corresponding first and second 
monomer unit residues is increased. Increases in monomer length have previously been 
shown to stabilize coiled coil structures. Alternatively, stability can be improved by 
introducing bonding between adjacent peptide monomer units in the same strand. For 
example, Kent (Dawson et al (1994) Science 266: 776) and co-workers have produced peptide 
bonds between adjacent polypeptide units by coupling and subsequent rearrangement of a 
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cysteine residue at the N end of one polypeptide unit to a thio-ester derivatised C-tenninus of 
another unit. 

Additionally, the protein structures may be stabilised and derivatised by using them to 
template the polymerisation of synthetic polymers. 

Definitions 

The terms used in the specification are to be given the ordinary meaning attributed to them by 
the skilled addressee. The following is given by way of clarification: 

Amino acid. 

This term embraces both naturally-occuring amino acids and synthetic amino acids as well as 
naturally-occuring amino acids which have been modified in some way to alter certain 
properties such as charge. In all cases references to naturally-occurring amino acids may be 
considered to include synthetic amino acids which may be substituted therefor. 

Coiled Coil 

A coiled-coil is a pep tide/protein sequence usually with a contiguous pattern of hydrophobic 
residues spaced 3 and 4 residues apart, which assembles (folds) to form a multi-meric bundle 
of helices. Coiled-coils including sequences with multiple offset repeats are also 
contemplated. 

Dimer 

A dimer is a two stranded structure. 
Heterodimer 

A heterodimer is a dimeric structure formed by two different stands. 
Staggered heterodimer 

A staggered heterodimer is a structure in which the two strands assemble to leave overlapping 
ends that are not interacting within the heterodimer. 
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Blunt-end assembly 

Blunt-end assembly is association where the two strands combine to give flushed i.e 
non-overlapping ends. 

Protofibril 

A protofibril is a protein structure assembled longitudinally from staggered heterodimers 
interacting through their overhanging ends. 

Fibre 

A fibre is a structure formed by lateral association of two or more protofibrils. 

Protein structures and methods of producing protein structures in accordance with the 
invention will now be described, by way of example only, with reference to the accompanying 
Figures 1 to 8 in which: 

Fig. 1 illustrates the design and the sequences of self-assembling fibre (SAF) peptide 
monomers of the invention. 

Fig. 2 illustrates computer modelling of the designed self-assembling fibre of the invention. 

Fig. 3 illustrates the results of circular dichroism (CD) and linear dichroism (LD) experiments 
on protein structures of the invention. 

Fig. 4 illustrates the assembly of synthetic protein fibres visualized directly by transmission 
electron microscopy and an analysis of fibre width In all panels, the white scale bars represent 
100 nm. Fig. 4D is a histogram showing the distribution of fibre widths determined using 
TEM for fresh (white bars) and matured (black bars) mixtures of SAF peptides at 100 \iM (a 
width value of "jc" on the histogram includes all measurements made from "(x-5) to *")• 

Fig, 5 is a cartoon showing the possible anti-typic association of parallel helical peptides 
leading to a homo-oligomeric peptide nanotube. 

Fig. 6 is an x-ray diffraction pattern of an aligned protein fibre of the invention. 
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Fig. 7 is an electromicrograph showing fibres which have been derivatised through the 
inclusion of flurophores; and 

Fig. 8 shows amino acid sequences designed to form blunt-ended heterodimers. 
1) Peptide Design and Synthesis 

Various peptide monomer units were designed ,as described above. The monomers and 
capping peptides (designed to complement the sticky ends of the monomers so as to produce 
flush, or blunt ends and, so, arrest longitudinal fibre assembly) are set out in Table 1 : 
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Fig. 1 shows (A) A mechanism for self-assembly: complementary charges in "companion" 
peptides direct the formation of staggered, parallel heterodimers; the resulting "sticky" ends 
are also complementary and promote longitudinal association into extended structures. Fig 
1(B) shows the designed amino acid sequences: each peptide comprised canonical heptad 
repeats (abcdefg) with He at a and Leu at d to guide the formation of coiled-coil dimers; 
oppositely-charged residues were incorporated at e and g to favour the staggered dimer with 
sticky ends; asparagine residues (which preferentially pairs with each other at a sites 
(Gonzalez L et al (1996) Nature Structural Biology 3, 13: 1011-1018) were included to 
cement the prescribed register further and to favour the parallel structures. Fig. 1(C) is a 
helical-wheel representation, summarizing the designed sequences in context. The view is 
from the N-terminus with heptad sites labeled a-g and assumes 3.5 residues per helical turn to 
emphasise the heptad repeat. 

The peptides were synthesized on an Applied Biosystems 432A Peptide Synthesizer using 
solid-phase methods and Fmoc chemistry. Peptide samples were purified using 
reversed-phase HPLC and their identities confirmed by MALDI-TOF mass spectrometry. 

Various combinations of peptide monomers and capping peptides were tested as set out in 
Table 2: 
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In addition and as a control, the SAF-pl c sequence was permuted (N- and C-terminal halves 
were swapped) to produce peptide SAF-p3: 

E I DALE YE NDALEQK IAALKQK IASLKQ 

This design should combine with SAF-p2D to form a blunt-ended structure, which should not 
form fibres. 

2) Modeling of Protein Fibre Structure 

A model of the three-dimensional structure of the designed protein fibre resulting from the 
assembly of SAF-pl and SAF-p2 was made from the minimised structure of a model 
coiled-coil 35-mer, (LAALAAA) 5 , which was generated using Crick's Equation and had an 
ideally packed interface (G. Offer and R. Sessions, J. Mol Biol 249, 967 (1995)). Copies of 
the 35-mer were superimposed with an overlap of one heptad repeat to extend the structural 
template, and the backbone was rejoined after removal of overlapping segments. Residues in 
the two-stranded template were replaced with the sequences of the S AF peptides, staggered 
relative to each other by two heptad repeats according to the alignment in Fig. IB. The 
structure was soaked in a 5 A layer of water and energy minimised until the average absolute 
derivative of coordinates with respect to energy fell below 0.01 kcal A' 1 . The structure was 
built and visualized using Insight II 97.0 (Molecular Simulations Inc.), and was 
energy-minimized using Discover 2.9.8 (Molecular Simulations Inc.) with the consistent 
valence forcefield. In Fig 2(A) peptides SAF-pl and SAF-p2 (each coloured dark 
grey-to-light grey from the N-terminus) interact through core residues including asparagine 
pairs (coloured mid-grey) to form the two strands of a staggered, parallel, coiled-coil fibre. In 
Fig. 2(B), negatively charged glutamate side chains (coloured light grey) and positively 
charged lysine side chains (coloured black) form complementary charge interactions between 
the SAF peptides. 

3) Circular Dichroism Experiments 

Peptide samples were incubated at 5°C in 10 mM MOPS (3-(N-Morpholino)propanesulfonic 
acid), pH 7. Sample concentrations were determined from their UV absorbance at 280 nm 
(SAF-pl) and 214 nm (SAF-p2). After baseline correction, ellipticities in mdeg were 
converted to molar ellipticities (deg cm 2 dmol-res' 1 ) by normalizing for the concentration of 
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peptide bonds. Data were recorded in a cell of 1 mm path length by integrating the signal for 
5s (and Is for the fresh 100 fiM peptide mixture) every nm in the range 205-260 nm. CD 
measurements were made using a JASCO J-715 spectropolarimeter fitted with a Peltier 
temperature controller. 

The CD data shown in Fig. 3 provides spectroscopic evidence for the formation of helical 
structures by the SAF peptides. Fig. 3(A) shows circular dichroism (CD) spectra at 10 

for: SAF-pl (- — ), SAF-p2 (- -), the average of these spectra ( ), and the experimental SAF 

peptide mixture (o). Fig 3(B) shows CD spectra at 100 jiM - the key is the same as for Fig 
3(A), but with the additional spectrum (•) being for the SAF peptide mixture after 
"maturation" for 1 h. 

Consistent with our design, neither SAF-pl nor SAF-p2 was highly structured in aqueous 
solution at pH 7 and 5 °C (Fig. 3). However, when mixed in equal proportions the circular 
dichroism (CD) spectrum changed and, moreover, was markedly different from the 
theoretical spectrum generated by averaging the spectra for the isolated peptides. In 
particular, the spectrum for the mixture had intense minima at 208 and 222 nm consistent 
with the formation of a-helical structure, but these features were not as pronounced in the 
spectra of the individual peptides. This was clear evidence that the two peptides interacted to 
form an a-helical structure as designed. Furthermore, and as expected for a multimerization 
event, the magnitude of these spectral changes depended on peptide concentration; a SAF 
mixture with 10 fiM of each peptide, did show a weak signal indicative of some a-helical 
structure, however, a 100 \xM mixture gave a much stronger signal (Figs. 3A&B). 

The shape and intensity of spectra from 100 |iM mixtures of the SAF peptides also changed 
with time (Fig. 3B). Spectra recorded immediately after mixing a "fresh" sample displayed 
some a-helical structure. After incubation of the mixture for 1 hour at 5 °C ("maturation"), 
however, the signal at 222 nm was more intense, and indicated approximately 75 % a-helix, 
consistent with substantial coiled-coil formation. 

Maturation of 100 jiM SAF peptide mixtures was also accompanied by slight clouding of the 
samples. Scattering effects from such samples can lead to attenuation and distortion of CD 
spectra (D. Mao and B. A. Wallace, (1984) Biochemistry 23, 2667). However, we could 
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the' CD instrument did not affect the shape or the intensity of the spectrum. Furthermore, we 
established that the majority of the CD signal from the mixtures derived from the suspended 
material: a supernatant without the suspended material, which was recovered by 
centrifugation of a matured 100 \iM SAF mixture, gave only a weak CD signal similar to the 
10 jiM mixture. 

Thus, the CD data were wholly consistent with the desired a-helical SAF design and, 
moreover, indicated the formation of large assemblies. 

As a control, SAF-p3 (the permutation of SAF-pl (identical to SAF-plc)) was designed to 
form a blunt-ended heterodimer with SAF-pl that should not assemble further into fibres. 
100 fxM mixtures of SAF-p2 (identical to SAF-p2D) and SAF-p3 were analysed by 
sedimentation equilibrium in the analytical ultracentrifuge. The resulting data were best 
fitted assuming a single ideal species in solution, and the molecular weight was allowed to 
vary during the fit. An M r of 6422 (with 95% confidence limits of 5924 and 6911) was 
obtained, which is very close to the expected heterodimer value of 6303 calculated from mass 
spectrometry of the individual peptides. CD spectra for 100 |iM fibre-producing mixtures 
(SAF-pl with SAF-p2), and for blunt dimer-producing mixtures (SAF-p2 with SAF-p3), were 
recorded. For the blunt dimer-producing mixtures, the shape and intensity of the CD 
spectrum were fully consistent with coiled-coil formation as designed. In contrast to the 
fibre-producing mixtures, the blunt dimer-producing mixtures showed no signs of maturation; 
that is, negligible spectral changes and no clouding of solutions occurred upon incubation. 
Interestingly, the intensity of the minimum near 222 nm, which is an accepted indicator of 
a-helical structure and degree of a-helical folding, was similar for both mixtures. This 
strongly supports the formation of a-helical structure as designed in the fibre-producing 
mixtures despite the spectral shifts observed upon maturation. 

4) Linear Dichroism Experiments 

Linear dichroism (LD) spectroscopy was also used to test if elongated structures were being 
formed as designed. Long polymers such as DNA molecules can be oriented by shear flow. 
This effect, can be monitored by LD spectroscopy provided that chromophores also become 
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aligned by the flow (M. Bloemendal (1994) Chem. Soc. Rev. 23, 265; A. Rodger and B. 
Norden (1997) Oxford Chemistry Masters (Oxford University Press, Oxford), vol. 1). 

Peptide samples were prepared for LD as for CD. LD data were collected on samples 
spinning in a couette flow cell by integrating the signal for 2 s every run in the range 210-320 
nm, using a JASCO J-715 spectropolarimeter. After baseline correction, absorbance was 
converted to molar extinction coefficient (1 mol-res" 1 cm" 1 ) by normalizing for the 
concentration of peptide bonds. A linear correction for a sloping baseline was made to the 
data from the 100 \iM SAF peptide mixture. 

The results are depicted in Fig. 3D, which shows linear dichroism (LD) spectra for: 20 ^iM 

tropomyosin ( ), the SAF peptide mixture at 10 jiM ( — ), and the SAF peptide mixture at 

100 iiM in the absence (•) and presence (o) of 0.5 M KF. 

For instance, we found that tropomyosin, which forms a dimeric coiled coil approximately 42 
nm in length, could be aligned to give a LD signal (Fig. 3D). In contrast and consistent with 
our design and the CD data, experiments with a 10 \xM SAF mixture, (Fig. 3D), and for the 
individual peptides at 100 jiM (data not shown), LD signals were not detected. However, a 
matured 100 jaM SAF peptide mixture gave a strong absorbance from the peptide backbone 
(210-240 nm) and some signal in the aromatic region (260-290 nm) during flow orientation 
(Fig. 3D). As only long structures are aligned by this technique, the data demonstrated that 
long fibres at least 500 nm in length were present in solutions of the matured 100 ^iM SAF 
peptide mixtures. 

5) Electron Microscopy 

To confirm fibre assembly, we used electron microscopy to visualize structures in the peptide 
preparations directly. For TEM experiments, peptide samples were incubated for 1 h at 5 °C 
in filtered 10 mM MOPS, pH 7. A drop of peptide solution was applied to a carbon-coated 
copper specimen grid (Agar Scientific Ltd, Stansted, UK), and dried with filter paper before 
negative staining with 0.5% aqueous uranyl acetate and then dried at 5 °C. A "fresh" SAF 
peptide mixture was prepared by mixing preincubated solutions of the individual peptides at 
200 \iM directly on the specimen grid, before drying and negative staining as described. 
Grids were examined in a Hitachi 7100 TEM at 100 kV and digital images were acquired 
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with a (800 x 1200 pixel) charge-coupled device camera (Digital Pixel Co. Ltd., Brighton, 
UK) and analyzed (Kinetic Imaging Ltd., Liverpool, UK). 

For scanning electron microscopy (SEM) experiments, negatively-stained specimen grids 
were sputter-coated with gold and examined in a Leo Stereoscan 420 SEM at 20 kV and with 
a probe current of 10 pA. 

No structures were visible up to 100 000 times magnification by transmission electron 
microscopy (TEM) for either the 10 nM SAF mixture, or for the individual peptides at 100 
\iM concentration (data not shown). However, TEM of a 100 \xM SAF mixture at 50 000 
times magnification revealed time-dependent formation of long fibrous structures, consistent 
with the CD and LD data. Fresh mixtures showed large numbers of extended fibres of 
various widths. The majority of these had a diameter of about 20 nm (Figs. 4A (a fresh 
mixture at 100 ^iM) & Fig 4D); finer fibres were present, but their widths could not be 
measured reliably. Images recorded for the matured mixtures showed fewer fibres, but these 
were more distinct and thicker than those observed in the fresh mixture (Fig. 4B&D)r 

Scanning electron microscopy (SEM) of a matured mixture showed no evidence for fibre 
branching. Rather, the fibres were simply intertwined as if layered on top of each other (Fig. 
4C). It was not possible to follow the full length of fibres due to intertwining, but they were 
at least several hundred microns in length. Although the density of fibres varied across the 
surface of the EM grid, for the matured samples at least, their diameters were quite uniform 
with a mean width of 43.3 (SD = 9.3) nm (Fig. 4D). As the original design was for a 
longitudinally extended, but otherwise two-stranded coiled coil the average diameter that we 
might have expected was about 2 nm. Therefore, the EM data suggested that the designed 
two-stranded coiled-coil fibres associate laterally into higher order assemblies. 

6) X-ray Fibre Diffraction 

Mixtures of SAF peptides at 500 |iM in 10 mM MOPS, pH 7, were incubated on ice for at 
least lh, before centrifiigation at 6500g for 5 min. Droplets of fibre-containing solutions, 
taken from the bottom of the centrifuged tubes, were suspended between the ends of two 
wax-filled capillaries and allowed to dry slowly overnight at 4°C, yielding clumps of partially 
aligned fibres. X-ray fibre diffraction images were collected using a Rigaku CuKcc rotating 
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anode source (wavelength 1.5418 A) and a R-AXIS IV detector. Samples were maintained at 
5°C during data collection with cool air from a cryostream (Oxford Cryo-systems). The X-ray 
fibre diffraction pattern collected from SAF peptide fibres showed the following features 
(Fig. 6): (1) a short meridional (that is, parallel to the long fibre axis) reflection at 5.1 1± 0.03 
A; (2) the harmonic of this 5.1 1 reflection at 10.19 ± 0.05 A; and (3) a stronger, more diffuse 
reflection centered at 8.8 ± 0.15 A on the equator. These features are consistent with 
a-helical coiled-coils aligned with the fibre axis. The 5.1 A meridional reflection 
corresponds to the pitch of the helices within the coiled-coils. The other expected reflection 
on the meridian-that is, that at 1.5 A and corresponding to the rise per residue-lies out of the 
resolution of the current data sets, whereas the equatorial reflection reveals the mean distance 
between a-helical axes. This value at 8.8 A is less than the observed value for keratin but 
falls within reported ranges for dimeric coiled-coil peptides. 

7) Effect of Potassium Fluoride on Protein Fibre Assembly 

Molecular modeling of the SAF sequences into an extended two-stranded coiled coil also 
highlighted potential complementary charge interactions on the surface of the protofibrils, 
Figs 1&2. In accordance with this, experimentally it was found that moderate concentrations 
of salt inhibited protofibril and thick fibre assembly. First, CD spectra recorded for both the 
individual peptides and a 100 |iM mixture of SAF peptide samples with 0.5 M potassium 
fluoride showed reduced helical CD signals and there was no evidence of "maturing" in the 
mixed samples (Fig. 3C). Second, the LD signal described previously for the matured 100 
pM SAF peptide mixture was also lost when the experiment was repeated in the presence of 
salt (Fig. 3D). Finally, TEM images of a 100 jiM SAF mixture also demonstrated that fibres 
were not formed in the presence of 0.5 M KF (Fig. 4E). Fig. 4E shows the results of TEM of 
a matured SAF peptide mixture at 100 \iM incubated in the presence of 0.5 M KF. 
The inventors did not knowingly design any features into the SAF peptides to foster further 
association of the two-stranded coiled coils. The observation of thick fibres in SAF peptide 
preparations, therefore, raised the question: what interactions guided and stabilized these 
higher-order assemblies? The inventors therefore propose that features inherent in repeating 
structures of the type that they designed will naturally promote such fibre assembly 
(fibrillogenesis). 
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Consider a protofibril as depicted in Fig. IB and 2 A. Any sequence feature presented on its 
surface by either, or both of the constituent peptides will be repeated at regular intervals along 
the protofibril. The repeat length will be equal to the length of the peptides (for SAF-pl and 
SAF-p2 this was 28 residues, or about 4.2 nm). Furthermore, the motif will spiral around the 
protofibril tracking the superhelix of the coiled coil, which has a pitch of about 15 nm for a 
contiguous, heptad-based, dimeric structure. In this scenario, protofibril-protofibril 
interactions may be promoted if another sequence motif complementary to the first is present 
in the potential partner. This is because the pitches of the complementary motifs on each 
protofibril will match precisely. Thus, once initiated, lateral association of protofibrils — 
that is, fibrillogenesis — will be cemented by many regularly spaced interactions as in a 
crystal. As a result, the complementary interactions need only be weak as the stability of the 
protofibril-protofibril interaction rests on an avidity effect rather than a small number of 
strong interactions. Provided that the components of the assembly can make more than one 
type of complementary surface very extensive molecular assemblies may result. 
The inventors used electrostatic interactions both to direct heterodimer formation, and to 
promote elongation of the protofibrils (Figs. 1 and 2). These features would also create 
periodic and alternating patches of charge in the protofibrils provided they are regular as 
envisaged (Fig. IB and 2B). These charged patches could guide and stabilize the higher order 
assemblies. Indeed, similar features have been noted in several natural fibrous proteins and 
have been implicated in the assembly of multi-protein filaments (J. J. Meng et al (1994) Biol. 
Chem. 269, 18679; A. D. McLachlan and M. Stewart (1976) Mol Biol 103, 271), and small 
synthetic peptide systems (S. G. Zhang et al (1993) Proc. Natl Sc. U.S.A 90, 3334). The 
experiments with salt (KF) described above suggest that salt-bridges (electrostatic interaction) 
may be at least in part the cause of fibrillogenesis. 

8) Coiled-coils design 

a. For two superimposed hep tads there are three possible sequence offsets of 1, 2 and 3 
residue(s), which are equivalent to 6, 5 and 4-residue offsets, respectively. For a regular 
3.6-residue-per-turn a-helix, these set up two hydrophobic faces with angular offsets of 
100°, 160° (360-200) and 60° (360-300), respectively, around the outside of the helix. 
This is best seen on a helical wheel. Accounting for helical supercoiling - i.e assuming 
3.5 residues per turn and using the accepted helical-wheel representation for the_ 
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coiled-coil these angular offsets are altered to 103°, 154° and 51% respectively. However, 
both sets of angles are over-simplifications when considering helix-helix interactions in 
actual coiled-coil systems because side-chain size, geometry and packing also affect the 
helix interfaces (Harbury, P. B. et al (1993) Science 262, 1401-1407; Harbury, P. B. et al 
(1994) Nature 371, 80-83; Malashkevich, V. N. et al (1996) Science 274, 761-765). 
Nonetheless, we found that many natural coiled-coil assemblies, at least, were consistent 
with the approximate angular offsets: Trimers could be considered as having overlapping 
heptads separated by 3 residues (angular offset = 51/60°). Whereas, tetrameric and 
pentameric coiled-coils were often variations on a theme with two hepad repeats offset by 
1 residue (100/103°). 

b. Two hep tad repeats offset by two residues: a-cylinder constructions 

Sequence offsets of 2 residues are potentially more interesting than the 1- and 3-residue 
offsets. This is because of the possibility of placing hydrophobic (H) residues at a, c, d, 
and f, with c and f effectively making up the a* and d* positions of the second, offset 
heptad. This is represented below, where P signifies polar (non-core) residues. 

abcdefgabcdefg repeat 1 

HPPHPPPHPPHPPP binarypattern 1 

PPHPPHPPPHPPHP binary pattern 2 

f'g'a'b'c'd'e'f'g'a'b'c'd'e 1 repeat 2 

abcdefgabcdefg assigned register 

HPHHPHPHPHHPHP overall binary pattern 

Such sequence patterns would results in two hydrophobic seams with a wide angular 
separation (154/160°), which would place them roughly on opposite sides of the helix. 
Furthermore, it offers two possibilities for parallel helix-helix packing arrangements; syn, 
where two like faces - i.e a / d with a / d , or c/f with c/f - from neighbouring helices 
combine to produce an opena-sheet, Fig. 6a; anti, where a/d faces pair with c/f In the 
anti-arrangement the structure can close to form a a-cylinder. For antiparallel pairs of 
helices syn-typic association should lead to cylinders, whereas sheets should be formed 
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c. A natural a-cylinder 

TolC has two a-barrel-like domains (Koronakis, V. et al (2000) Nature 405, 914-919). 
Both have 12 helices contributed by 3 monomers. In the lower barrel each helix pairs 
with another from the same protomer to form separate supercoiled, antiparallel 
coiled-coils; SOCKET analysis revealed extensive antiparallel knobs into holes .(KIH) 
interactions within these pairs, but not between them. In contrast, the helices of the upper 
barrel appear to pack more uniformly, albeit with a slant, to describe an a-cylinder. The 
SOCKET output for this part of the structure revealed many fewer KIH interactions than 
found in the lower barrel. Furthermore, KIH interactions were not contiguous around the 
cylinder and, in particular, they were more extensive between helices in the same 
monomer, but less regular between the helices abutting the monomers. In our view, the 
TolC barrel represents a variation of the cylinders formed by protein structures of the 
invention. 

Nevertheless, the inventors were able to assign heptad registers for the helices of the 
upper barrel unambiguously. This revealed knobs at relative a, c, d, and / positions and 
syn-typic association of two seams adjacent helices; i.e fully consistent with the theory 
outlined above. 

We believe that it will be possible to constuct a-sheets and a-cylinders using helices in 
parallel. The use of parallel helices does have one interesting consequence for the 
construction of a-cylinders, however: as the pairing in these structures will be anti-typic, 
a residues on one helix partner c residues of a neighbouring helix at the same level in the 
structure. Similarly, d and / residues pair at the intervening levels. The result will be that 
successive helices will be translated up the helix and cylinder axes by two residues, which 
is equivalent to «3A. Thus, attempts to construct a-cylinders from parallel helices will 
give spirals of helices which may or may not close. This is, however, potentially 
extremely interesting as it opens up possibilities for making peptide-based nanotubes as 
described above. 
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A second consideration for a-cylinder construction is the consequences of helix and 
coiled-coil supercoiling. The upper barrel of TolC has 12 helices. Based on a structure of 
parallel helices with canonical supercoiling, i.e an angular separation of 154° between the 
two seams in each helix, we calculated that the cylinder should close at 14 helices. 
However, variations in helix number are expected. One reason for this is that helices 
cannot supercoil in two direction simultaneously, and some distortion is required to 
maintain packing at both interfaces. We found structural precedents for this in the Protein 
Data Book PDB where tight knobs-into-holes packing was maintained (Walshaw & 
Woolfson, unpublished); indeed, the central helices of the 3-helix a-sheets are straight, 
Fig. 7b. (n.b . The slanting of the helices in the upper barrel of TolC may offer a 
compromise between straight and supercoiled helices). Assuming the packing of 
completely straight helices, the angular offset becomes 160° and 18 helices would close a 
cylinder. However, given that, as in 3-, 4- and 5 -stranded coiled coils, side chains 
mediate the helix-helix contact angles other oligonmerisation states might be possible 
(Harbuiy, P. B et al (1993) Science 262, 1401-1407; Harbury, P. B. et al (1994) 371, 
80-83; Malashkevich, V. N. et al (1996) Science 274, 761-765): we calculate that small 
adjustments in the angular offset between 144° to 162° varies the helix number from 10 to 
20. 

9) Formation of Protein Structures 

As mentioned above, the protein structures of the invention may have various applications 
such as in: 

Nanotubes 

a. This can be achieved for example by combining the aforementioned 7- and 11 -residue 
repeats with offsets in the sequence. The effect would be eliminate the overall 
hydrophobic displacement. In other words, alternating heptad and hendecad repeats give 
an 18-residue repeat to match the a-helical repeat; in the a-helix, 18 residues span 5 
helical turns exactly. It may therefore be possible to create a completely closed peptide 
nantotube (Fig. 5 shows part of a nanotube) In the parallel, straight helix case there would 
be 18 helices per turn of the "cylinder", and the rise per turn is 36 residues. Thus, a 
36-residue peptide with a 7-11-7-11 repeat offset by 2 residues should form a spiral of 
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helices the ends of which meet to close the tube. Such nanotubes maybe particularly 
useful in the production of nanoscale piping and plumbing. The interior of the tube may 
be derivatised to control the flow of different small (2-40 A) molecules. 

b. Derivitised and branched peptides and peptide templates 

The self-assembling peptides of the invention are relatively small and synthetically 
accessible. Thus, non-standard derivatisable side chains may be incorporated in them. 
For example, the monomer units can be made with a single cysteine residue at an exterior 
f position. These can be used to couple small molecules and other peptides using 
thiol-based chemistry. A wide variety of thiol-reactive probes are available. In particular, 
the peptides can be tagged with fluorophores. For instance, with one peptide labelled 
with Fluorescein and the other with Rhodamine fibres visualised by confocal microscopy 
appear green and red, respectively (Fig. 7). There is a possibility for FRET between the 
probes, which may pack closely in the fibres, and this may confuse interpretation. To 
avoid this the tagged peptides can be doped into fresh, assembling S AF mixtures. Having 
available fluorescently labelled peptides and fibres offers another route to tracking 
fibre/network assembly and orientation. 

To generate branched self-assembling fibres "T-shaped" conjugated peptides can be 
made. These are covalent heterodimers made by mixing and coupling together variants of 
two SAF peptides: one with a terminal cysteine and the other having a central cysteine 
residue. The desired products can be purified from the mix of disulphide-linked peptide 
by PHLC. Doping the conjugated ("T") peptides into fresh SAF mixtures should 
propagate fibre assembly in three dimensions as both the "bar" and the "stem" of the "T" 
could become incorporated in, or initiate, fibres. The resulting networks can be visualised 
and characterised by EM. 

Peptide synthetic diblock copolymer hybrids may be produced. Suitable methods for 
preparing water soluble diblock copolymers using atom transfer radical polymerisation 
are described in X. S. Wang et al Chemical Communications 1817 (1999) and X. S Wang 
et al Macromolecules 33, 257 (2000). 

The protein fibres of the invention may be used to template and control this 
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polymerisation either tc^roduce hybrid fibres or if the peptide template is subsequently 
disassembled and marked away, to provide routes to water soluble "fishnet" nanotubes. 
Other possibilities include: conjugating polymers onto preassembled peptide fibres; 
conjugating the polymers and peptides prior to fibre assembly; and effecting 
polymerisation on the pre-assembled fibres. 



c. Formation of Matrices 

The protein fibres of the invention may be arranged to form two and three dimensional 
grids and matrices respectively. One application for such matices is in the purification of 
biological fluids such as blood. An affinity matrix could be assembled (for example in 
situ in blood) to remove blood contaminants such as viruses. In the case of virus removal, 
a binder for the target contaminant (e.g a peptide or protein with natural or engineered 
affinities for a viral coat protein) can be fused to a peptide monomer units in the protein 
structure of the invention. The matrix can then be removed from blood along with any 
bound contaminants by light centrifugation. For example, it is estimated that a 100 nm 
length of fibre would have a mass of > 12 MDa which would readily be removed. Such 
affinity matrices have a number of advantages over larger naturally occurring proteins. In 
the assembled matrices any binders are aligned to give high effective avidities for the 
targeted molecules. 



d. Other applications 

Other applications for protein structures in accordance with the invention include: 

i. preparation of organised networks for seeding the crystalisation of biomolecules for 
X-ray crystallography; 

ii. using ordered fibres to promote cell growth for tissue engineering; 

iii. the construction of nanoscale molecular sieves 

iv. the preparation of nanoscale molecular grids/scaffolds that could be used as supports 
for a variety of functional small or macromolecules. 

v. functionalised grids and networks could be used in, for example, catalysis, 
afBnity-sieving/purification of biological fluids and other research solutions, the 
recruitment of endogenous molecules and co-factors to promote tissue repair and 
tissue engineering in general. 
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vi. to create novel lab-on-chip technologies, peptide self-assembly could be combined 
with lithography as follows. 

Lithography and related techniques can be used to pattern a variety of surfaces with 
channels, which can be made of a suitable size (e.g 20-100 nm wide and deep) to 
accommodate peptide fibres. These can then be used to direct the assembly of the 
fibres from solutions mixed directly on the surfaces. Furthermore, using 
well-established chemistry, the inventors envisage fimtionalising the peptide fibres 
with a variety of small molecules and other proteins. This proposed combination of 
peptide design, self-assembly and lithography should allow the development of 
ordered arrays of functional polymers on specific surfaces. 

vii. Assembled fibres could also be used as fine (therefore, high resolution) tips in AFM 
(atomic force microscopy) the current limit is about 10-25 nm using carbon 
nanotubes. 



SUBSTITUTE SHtET (RULE 26) 



WO 01/21646 



25 



PCT/GBOO/03576 



Claims 

1. A protein structure comprising a plurality of first peptide monomer units arranged in a 
first strand and a plurality of second peptide monomer units arranged in a second strand 
in which a first peptide monomer unit in the first strand extends beyond a 
corresponding second peptide monomer unit in the second strand in the direction of the 
strands. 

2. A protein structure according to claim 1 in which the strands together form a coiled coil 
structure. 

3. A protein structure according to claim 1 or 2 in which at least one charged amino acid 
residue of a first peptide monomer unit is arranged to attract an oppositely-charged 
amino acid residue of a second peptide monomer unit. 

4. A protein structure according to claim 3 in which the charged amino acid residue is in 
an end portion of the first peptide monomer unit which extends beyond the 
corresponding second peptide monomer unit in the second strand. 

5. A polypeptide structure according to any preceding claim in which at least one strand 
consists solely of first or second peptide monomer units respectively. 

6. A protein structure according to any preceding claim in which the peptide monomer 
units comprise a repeating structural unit. 

7. A protein structure according to claim 6 in which the repeating structural unit 
comprises a heptad repeat motif (abcdefg). 

8. A protein structure according to claim 6 in which the repeating structural unit 
comprises a hendecad repeat motif (abcdefghijk) 
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9. A protein structure according to claim 6 having isoleucine or asparagine at position a 
and leucine at position d. 

10. A protein structure according to claim 6 having valine or leucine at positions a and d 
respectively. 

1 1. A protein structure according to any one of claims 7 to 10 having oppositely-charged or 
otherwise complementary residues at positions g and e of respective monomer units. 

12. A protein structure according to claim 1 1 in which the oppositely-charged residues are 
glutamic acid and lysine residues or asparagine and aspartic acid residues, or synthetic 
derivatives of these amino acid residues. 

13. A protein structure according to any preceding claim in which the structure is stabilised 
by pairs of asparagine, arginine, lysine or other complementary residues provided by 
corresponding first and second peptide monomer units. 

14. A protein structure according to any preceding claim which is arranged to form a 
tubular structure. 

15. A protein structure according to claim 14 in which the peptide monomer units are offset 
by two or more amino acid positions in sequence whereby the peptide monomer units 
form a cylinder. 

16. A protein structure according to any preceding claim in which the first and second 
peptide monomer units have the sequence: 

a) KI AALKQKI AS LKQE I DALE YENDALEQ (SAF-pl) and 

b) KIRALKAKNAHLKQEIAALEQEIAALEQ (SAF-p2) respectively; or 

c) KIAALKQKIAALKQE I DALE YENDALEQ (SAF-pl A) and 
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d) KIRALKWKNAHLKQEIAALEQEIAALEQ ( SAF-p2C) respectively; or 

e) KIAALKQKIASLKQEIDALEYENDALEQ (S AF-p 1 C) and 

f) KI RALKWKNAHLKQE I AALEQE I AALEQ (SAF-p2C) respectively. 

17. A peptide monomer unit for use in preparing a protein structure the peptide monomer 
unit having an amino acid sequence selected from: 

a) KI AALKQKI AS LKQE I DALE YENDALEQ (SAF-pl); 

b) KI RALKAKNAHLKQE I AALEQE I AALEQ (SAF-p2); 

c) KI AALKQKI AALKQEIDALEYENDALEQ (SAF-plA); 

d) KIRALKWKNAHLKQEIAALEQEIAALEQ ( SAF-p2C); 

e) KIAALKQKIASLKQEIDALEYENDALEQ (SAF-plC) ; and 
d) KIRALKWKNAHLKQEIAALEQEIAALEQ (SAF-p2C). 

18. A protein structure or peptide monomer unit according to any preceding claim in which 
at least one amino acid residue is derivatised. 

19. A method of producing protein structures, the method comprising providing a mixture 
of first and second peptide monomer units which associate to form a protein structure 
according to any preceding claim. 

20. A method according to claim 19 in which the protein structure is derivatised. 

21. A method according to claim 19 or 29 in which the protein structure is stabilised by 
cross-linking. 
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22. Protein fibres produced by an association of protein structures according to any one of 
claims 1 to 3 or a method according to claim 19, 20 or 21 . 

23. A kit for making protein structures, the kit comprising first and second peptide 
monomer units which associate to form a protein structure according to any one of 
claims 1 to 13 or protein fibres according to claim 22. 

24. A two dimensional matrix comprising a protein structure according to any one of 
claims 1 to 13 or protein fibres according to claim 22. 

25. A three dimensional grid comprising a protein structure according to any one of claims 
1 to 13 or protein fibres according to claim 21. 

26. A matrix according to claim 25 which is arranged to assemble in solution. 

27. A matrix according to claim 25 or 26 which is arranged to bind a target entity. 

28. A matrix according to claim 27 which is arranged to bind viruses. 

29. A method of forming a matrix according to any one of claims 25 to 28 in which a 
mixture of separate first and second monomer units is provided and are then caused to 
associate to form a protein structure in accordance with the invention, an accumluation 
of such protein structures assembling in turn to form a three dimensional matrix. 

30. A method according to claim 29 in which the matrix is formed in situ. 

31. A method for controlling the production of a synthetic polymers comprising assembling 
a protein structure in accordance to any one of claims 1 to 16 in association with the 
polymer. 
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32. A method according to claim 31 in which the protein structure is removed after 
synthesis of the polymer. 

33. A tip for use in Atomic Force Microscopy comprising a protein structure according to 
any one of claims 1 to 16. 
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Peptides 

This invention relates to protein fibre formation and in particular to methods of producing 
protein fibres to form a protein structure comprising a plurality of first polypeptide units 
arranged in a first polypeptide strand and a plurality of second polypeptide units arranged in 
a second polypeptide strand, the strands preferably forming a coiled coil structure, and in 
which a first polypeptide unit in the first strand extends beyond a corresponding second 
polypeptide unit in the second strand in the direction of the strands. 
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